Speed-up Text Classification Learner CPU Inference

I have no problems of predicting using a text classification learner after it has been exported and loaded in GPU.

However, I have played around same leaner for CPU inference on a very large number of texts - 200M items roughly to predict.

I have noticed some odd behaviors of how the 'leaner.get_preds` uses the CPUs, as it is very slow…

  • the parameter n_batch actually does not do any jobs, we have to use learner.data.test_dl.batch_size to increase the batch size
  • no matter how many CPUs we have on machine for inference, it will only use half number of CPUs to do the predictions, I tested 16, 32, 72, 96 cores instance, it is always use half…
  • I have tried to use learner.data.test_dl.num_workers to see whether the split can be affected, as I believe it will impact the number of CPUs of batch preparation, however, it does not do anything…
  • I also tried to leverage ray to parallelise the predicting speed, however, it seems the learner object cannot be pickled properly, any good practice to speed up?

Ideally if you have a large number of items, add them in as a test set and do learn.get_preds(ds_type=DatasetType.Test) so it uses your GPU (or cpu as best it can)

Thanks for reply, the worst case I will use GPU instead. However, I do not understand why it is always using half of number CPUs, any ideas?

Ah my bad! I misread and saw you were already doing so :slight_smile: For the half, strange. I’m unsure on that.

@muellerzr @sgugger Have you been able to find a workaround for the padding issues causing decreased performance with get_preds() versus predict()?

For my use case of multi-label classification on text data, I’m noticing that most of the records not able to be categorized by get_preds() are correctly predicted using predict(). However, runtime with predict() is too long to use in production - it takes around 7 hours for 50k records compared to ~1.5min for get_preds().

Here’s my inference code for both approaches (can ignore the MLflow parts :stuck_out_tongue:)

Using predict():

input_df = pd.read_csv(filename, encoding = "ISO-8859-1", dtype='object').dropna(subset=['text'])

def predict_labels(new_data):
  # Update this with the production Experiment ID
  experiment_id = '4105035027016985'

  # Find the 'best' run and load model components
  top_run = mlflow.search_runs(experiment_ids=experiment_id, order_by=["metrics.`weighted avg precision` DESC"]).loc[0,['run_id','metrics.weighted avg f1-score','metrics.weighted avg precision', 'metrics.weighted avg recall']]
  model_uri = '/dbfs/databricks/mlflow/' + experiment_id + '/' + top_run['run_id'] + '/artifacts/fastai_model'
  learn = load_learner(model_uri, 'fastai.pkl')
  mlb_uri = mlb_uri = 'dbfs:/databricks/mlflow/' + experiment_id + '/' + top_run['run_id'] + '/artifacts/MultiLabelBinarizer'
  mlb = mlflow.sklearn.load_model(mlb_uri)
  
  predictions = []
  
  def predict_apply(df):
    results = learn.predict(df)
    agg_results = str(results[0])
    predictions.append(results[1].numpy().tolist())
    return(agg_results)
  
  new_data['Predictions'] = new_data['text'].apply(lambda x: predict_apply(x))
    
  colnames = mlb.classes_.tolist()
  colnames = [i.replace('lbl_','') for i in colnames]
  
  predictions_df = pd.DataFrame(predictions, columns=colnames)
  
  new_data = new_data.join(predictions_df)
  
  return new_data
  
input_df = predict_labels(input_df) 
input_df.head(5)

input_df.to_csv(path + 'results_predict.csv',index=False)

Using get_preds()

input_df = pd.read_csv(filename, encoding = "ISO-8859-1", dtype='object').dropna(subset=['text'])

def predict_labels(new_data):
  experiment_id = '4105035027016985'

  # Find the 'best' run and load model components
  top_run = mlflow.search_runs(experiment_ids=experiment_id, order_by=["metrics.`weighted avg precision` DESC"]).loc[0,['run_id','metrics.weighted avg f1-score','metrics.weighted avg precision', 'metrics.weighted avg recall']]
  model_uri = '/dbfs/databricks/mlflow/' + experiment_id + '/' + top_run['run_id'] + '/artifacts/fastai_model'
  learn = load_learner(model_uri, 'fastai.pkl')
  mlb_uri = mlb_uri = 'dbfs:/databricks/mlflow/' + experiment_id + '/' + top_run['run_id'] + '/artifacts/MultiLabelBinarizer'
  mlb = mlflow.sklearn.load_model(mlb_uri)
   
  learn.data.add_test(new_data['text'])
  predictions, _ = learn.get_preds(ds_type=DatasetType.Test, ordered=True)

  predictions = predictions.numpy()
  predictions = np.where(predictions > 0.4, 1, 0)

  colnames = mlb.classes_.tolist()
  colnames = [i.replace('lbl_','') for i in colnames]

  # Use column names to build array into DataFrame
  predictions_df = pd.DataFrame(predictions, columns=colnames)

  # Add aggregated tagging column
  predictions_df['Predictions'] = mlb.inverse_transform(predictions)
  predictions_df['Predictions'] = [', '.join(map(str, l)) for l in predictions_df['Predictions']]
  predictions_df['Predictions'].replace('', 'Not Categorized', inplace=True)

  new_data = new_data.join(predictions_df)
  return(new_data)

input_df = predict_labels(input_df)
input_df.to_csv(path + 'results_getpreds.csv',index=False)
1 Like

which version of fastai you are using? As I am pretty sure there is no ‘ordered’ argument anymore in my case. Therefore, might be some differences.

I’m using the most recent v1.0.59 (2019-10-26) - the ordered argument still exists for the text.learner version of get_preds() but otherwise yeah it’s removed elsewhere.

The padding issue has been addressed for v2 already, it will stay as it is in v1.

1 Like

Do you have any ideas regarding the odd pattern that only half CPUs are used in inference? Is that because CPUs have to be split evenly for batch preparation and predictions?

Thanks for the update @sgugger! Awesome to hear. When are you targeting to release v2?

When it’s ready :slight_smile:

2 Likes