How to use auxiliary targets

Hi everyone,
I am working on Linear model and neural net from scratch on lesson 5, in this project if I have to use also auxiliary targets in training then how will I do it?

Hey, if you post a little more detail and also what code you want to adapt, we might be able to help better. :slight_smile:

Actually I am working on a project which has training data, test data, meta data. Training data and test data have same columns and I don’t add or merge metadat directly, so I want to use metadata’s target column as auxiliary target. I did all the steps with some changes that are in Linear model and neural net from scratch but it give me errors, below I snipped my code and error:

def train_model(epochs=30, lr=0.01, n_coeff=56, t_aux_targets=t_aux_targets):
(layer1, layer2, layer3), const = init_coeffs(n_coeff)
coeffs = (layer1, layer2, layer3), const
for i in range(epochs):
one_epoch(coeffs, lr=lr)
return coeffs
coeffs = train_model(18, lr=0.2, n_coeff=56, t_aux_targets=t_aux_targets)

RuntimeError Traceback (most recent call last)

  6         one_epoch(coeffs, lr=lr)
  7     return coeffs

----> 8 coeffs = train_model(18, lr=0.2, n_coeff=56, t_aux_targets=t_aux_targets)

Cell In[139], line 6, in train_model(epochs, lr, n_coeff, t_aux_targets)
4 coeffs = (layer1, layer2, layer3), const
5 for i in range(epochs):
----> 6 one_epoch(coeffs, lr=lr)
7 return coeffs

Cell In[134], line 2, in one_epoch(coeffs, lr)
1 def one_epoch(coeffs, lr):
----> 2 loss = calc_loss(coeffs, trn_indep, trn_dep, t_aux_targets_train)
3 loss.backward()
4 with torch.no_grad():

Cell In[123], line 42, in calc_loss(coeffs, indeps, deps, aux_targets)
41 def calc_loss(coeffs, indeps, deps, aux_targets): #, aux_targets2
—> 42 main_preds, aux_preds = calc_preds(coeffs, indeps, aux_targets) #, aux_preds2=, aux_targets2
43 main_preds = main_preds.clamp(min=1e-6, max=1 - 1e-6)
45 # Calculate the log loss with binary cross-entropy for main targets

Cell In[124], line 7, in calc_preds(coeffs, indeps, aux_targets)
4 res = indeps
5 # Your implementation for calculating predictions using layer1, layer2, layer3, and const
6 # For example:
----> 7 res = torch.matmul(res, layer1)
8 res = torch.matmul(res, layer2)
9 res = torch.matmul(res, layer3)

RuntimeError: mat1 and mat2 shapes cannot be multiplied (493x57 and 56x56)

Fixing the code

The error seems to be that you pass data with shape (n_datapoints,57), but have a weight matrix of size (56,56), so they can’t multiply together. Remember you can only multiply matrices A and B together if A.shape[1] == B.shape[0].

I suspect you’ve added an extra column to your data, but haven’t adjusted n_coeff (as the first weight matrix is initialized as torch.rand(n_coeff, n_hidden).

So your code should run when you set n_coeff correctly. But it’s hard to tell exactly without the full code.

Auxiliary targets
Iiuc, you want to predict 2 things at the same time? Then you would have to decide how to represent the 2 targets in your output layer, and adjust get_preds and get_loss accordingly. One option is to increase the size of the output layer, and use one part for the first targets and the other part for the second targets. So something along the lines of

n_1st_target = 1  # in the notebook you only predict 1 number
n_2nd_target = ...

def init_coeffs():
   hiddens = [10, 10]
   sizes = [n_coeff] + hiddens + [n_1st_target  + n_2nd_target]  # we set the size of the output layer to n_1st_target + n_2nd_target
   n = len(sizes)
   layers = [(torch.rand(sizes[i], sizes[i+1])-0.3)/sizes[i+1]*4 for i in range(n-1)]
   consts = [(torch.rand(1)[0]-0.5)*0.1 for i in range(n-1)]
   for l in layers+consts: l.requires_grad_()
   return layers,consts

def calc_preds(coeffs, indeps):
   full_result = (indeps*coeffs).sum(axis=1)
   res1 = full_result[:n_1st_target]  # extract 1st part
   res2 = full_result[n_1st_target:]  # extract 2nd part 
   return res1,res2

def calc_loss(coeffs, indeps, deps):
   deps1,deps2 = deps[:n_1st_target],deps[n_1st_target:]    # extract 1st / 2nd part
   res1,res2 = calc_preds(coeffs, indeps)
   loss1 = torch.abs(res1-deps1).mean()
   loss2 = torch.abs(res2-deps2).mean()
   total_loss = loss1 + loss2  # here you can weight how important each loss is to you, e.g. by total_loss = loss1 + 2*loss2
1 Like

Thank you so much,

I’ll try, if further, I face any problem then I’ll consult to you

Hi UmerAdil,

I used this method but it was getting a lot of errors, so I used simple RandomForestClassifier and when I train on both targets and when I find log_loss, I get this error, set again and again,Can’t set it up, beacuse in my real target there are 2 classes while in my auxiliary target there are 4 classes…
Please help me to set it up?

ValueError: The number of classes in labels is different from that in y_pred. Classes found in labels: [0 1 2 3]

Hey again,

Random forests + Auxiliary targets
afaik using auxiliary targets for random forests makes no sense. Iiuc, the purpose of auxiliary targets in deep learning is to provide the neural net a “stronger gradient signal”. For example, if a neural net has to do task X, but because the task is difficult, it’s hard for the net to find a good gradient direction to move in. Adding another task Y, so that (i) learning Y is easier (because the gradient signal is clearer) and (ii) being good at Y makes it more likely to be good at X, should make it easier for the net to learn X.

Because random forests don’t use gradients, this doesn’t apply to them. So adding auxiliary targets shouldn’t improve your model.

Please do correct me, if I’m mistaken!

Random forests + Multiple targets
If you still want to predict multiple targets, you have to (just as with neural nets) decide how to represent the 2 targets in your output. You could for example predict their tuple.

E.g., if your first targets have classes ["A", "B", "C"] and your second targets have classes [1,2], then you can build new target classes as their product: [("A",1), ("B",1), ("C",1), ("A",2), ("B",2), ("C",2)]. Now you can predict those and then extract the two original targets.

1 Like

Hi @UmerAdil ,

I have some problem in my data, can you please check my code? Below I write my complete code. In my code I want to plot scatter plot with training data x, z(these are columns) and y as curved line on scatter dots but my code didn’t give the desire output. I have change parameters again and again but all in vein. :frowning_face:

data = pd.read_csv(‘3DSinusoidalANN.csv’)

Split data into features and response

X = data[[‘x’, ‘z’]] # training data

y = data[‘y’] # response data

Scale the data using StandardScaler

scaler_X = StandardScaler()

X_scaled = scaler_X.fit_transform(X)

scaler_y = StandardScaler()

y_scaled = scaler_y.fit_transform(y.values.reshape(-1, 1))

Split into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_scaled, test_size=0.2, random_state=42)

Create DataFrames after scaling

X_train_scaled_df = pd.DataFrame(X_train, columns=X.columns)

y_train_scaled_df = pd.DataFrame(y_train, columns=[‘y’])

Concatenate the two DataFrames

train_df_scaled = pd.concat([X_train_scaled_df, y_train_scaled_df], axis=1)
model = Sequential([
Dense(64, activation=‘relu’, input_shape=(2,)), # Input layer with 64 neurons
Dense(64, activation=‘relu’), # Hidden layer with 64 neurons
Dense(64, activation=‘relu’), # Hidden layer with 64 neurons
Dense(64, activation=‘relu’),
Dense(64, activation=‘relu’),
Dense(64, activation=‘relu’),
Dense(64, activation=‘relu’),
Dense(64, activation=‘relu’),

Dense(1)                        # Output layer with 1 neuron


Compile the model

learning_rate = 0.9
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
model.compile(optimizer=optimizer, loss=‘mean_squared_error’)

optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

#model.compile(optimizer=‘adam’, loss=‘mean_squared_error’)

Train the model

history =, y_train_scaled_df, validation_split=0.2, epochs=5000, batch_size=1024, verbose=20)
y_train_pred_scaled = model.predict(X_train_scaled_df)

y_train_pred = scaler_y.inverse_transform(y_train_pred_scaled)
fit_x = X_train_scaled_df[‘x’].values

fit_z = X_train_scaled_df[‘z’].values

fit_y = y_train_pred.flatten() # Flattening to match the shape

scat_x = X_train_scaled_df[‘x’].values

scat_z = X_train_scaled_df[‘z’].values

scat_y = y_train_scaled_df[‘y’].values.flatten() # Flattening to match the shapeplotscatter3Ddata(fit_x, fit_y, fit_z, scat_x, scat_y, scat_z)

Hey, I want you to get help, but with all due respect, is this question really a fastai course question?

Your code seems to use tensorflow, which isn’t used in fastai. I think you’ll be better off asking your question in the right forum.

Also, the better you ask questions, the more likely you’ll get help. So, a few tips:

  • only ask about 1 topic per post.
  • show your full code & say what you’ve tried - see How to ask for help
  • you can format code blocks with triple backtips - this make’s them easier to read

Wish you all the best!