Speed-up Text Classification Learner CPU Inference

(Martin Bai) #1

I have no problems of predicting using a text classification learner after it has been exported and loaded in GPU.

However, I have played around same leaner for CPU inference on a very large number of texts - 200M items roughly to predict.

I have noticed some odd behaviors of how the 'leaner.get_preds` uses the CPUs, as it is very slow…

  • the parameter n_batch actually does not do any jobs, we have to use learner.data.test_dl.batch_size to increase the batch size
  • no matter how many CPUs we have on machine for inference, it will only use half number of CPUs to do the predictions, I tested 16, 32, 72, 96 cores instance, it is always use half…
  • I have tried to use learner.data.test_dl.num_workers to see whether the split can be affected, as I believe it will impact the number of CPUs of batch preparation, however, it does not do anything…
  • I also tried to leverage ray to parallelise the predicting speed, however, it seems the learner object cannot be pickled properly, any good practice to speed up?
0 Likes

(Zachary Mueller) #2

Ideally if you have a large number of items, add them in as a test set and do learn.get_preds(ds_type=DatasetType.Test) so it uses your GPU (or cpu as best it can)

0 Likes

(Martin Bai) #3

Thanks for reply, the worst case I will use GPU instead. However, I do not understand why it is always using half of number CPUs, any ideas?

0 Likes

(Zachary Mueller) #4

Ah my bad! I misread and saw you were already doing so :slight_smile: For the half, strange. I’m unsure on that.

0 Likes

(Matt Kalebic) #5

@muellerzr @sgugger Have you been able to find a workaround for the padding issues causing decreased performance with get_preds() versus predict()?

For my use case of multi-label classification on text data, I’m noticing that most of the records not able to be categorized by get_preds() are correctly predicted using predict(). However, runtime with predict() is too long to use in production - it takes around 7 hours for 50k records compared to ~1.5min for get_preds().

Here’s my inference code for both approaches (can ignore the MLflow parts :stuck_out_tongue:)

Using predict():

input_df = pd.read_csv(filename, encoding = "ISO-8859-1", dtype='object').dropna(subset=['text'])

def predict_labels(new_data):
  # Update this with the production Experiment ID
  experiment_id = '4105035027016985'

  # Find the 'best' run and load model components
  top_run = mlflow.search_runs(experiment_ids=experiment_id, order_by=["metrics.`weighted avg precision` DESC"]).loc[0,['run_id','metrics.weighted avg f1-score','metrics.weighted avg precision', 'metrics.weighted avg recall']]
  model_uri = '/dbfs/databricks/mlflow/' + experiment_id + '/' + top_run['run_id'] + '/artifacts/fastai_model'
  learn = load_learner(model_uri, 'fastai.pkl')
  mlb_uri = mlb_uri = 'dbfs:/databricks/mlflow/' + experiment_id + '/' + top_run['run_id'] + '/artifacts/MultiLabelBinarizer'
  mlb = mlflow.sklearn.load_model(mlb_uri)
  
  predictions = []
  
  def predict_apply(df):
    results = learn.predict(df)
    agg_results = str(results[0])
    predictions.append(results[1].numpy().tolist())
    return(agg_results)
  
  new_data['Predictions'] = new_data['text'].apply(lambda x: predict_apply(x))
    
  colnames = mlb.classes_.tolist()
  colnames = [i.replace('lbl_','') for i in colnames]
  
  predictions_df = pd.DataFrame(predictions, columns=colnames)
  
  new_data = new_data.join(predictions_df)
  
  return new_data
  
input_df = predict_labels(input_df) 
input_df.head(5)

input_df.to_csv(path + 'results_predict.csv',index=False)

Using get_preds()

input_df = pd.read_csv(filename, encoding = "ISO-8859-1", dtype='object').dropna(subset=['text'])

def predict_labels(new_data):
  experiment_id = '4105035027016985'

  # Find the 'best' run and load model components
  top_run = mlflow.search_runs(experiment_ids=experiment_id, order_by=["metrics.`weighted avg precision` DESC"]).loc[0,['run_id','metrics.weighted avg f1-score','metrics.weighted avg precision', 'metrics.weighted avg recall']]
  model_uri = '/dbfs/databricks/mlflow/' + experiment_id + '/' + top_run['run_id'] + '/artifacts/fastai_model'
  learn = load_learner(model_uri, 'fastai.pkl')
  mlb_uri = mlb_uri = 'dbfs:/databricks/mlflow/' + experiment_id + '/' + top_run['run_id'] + '/artifacts/MultiLabelBinarizer'
  mlb = mlflow.sklearn.load_model(mlb_uri)
   
  learn.data.add_test(new_data['text'])
  predictions, _ = learn.get_preds(ds_type=DatasetType.Test, ordered=True)

  predictions = predictions.numpy()
  predictions = np.where(predictions > 0.4, 1, 0)

  colnames = mlb.classes_.tolist()
  colnames = [i.replace('lbl_','') for i in colnames]

  # Use column names to build array into DataFrame
  predictions_df = pd.DataFrame(predictions, columns=colnames)

  # Add aggregated tagging column
  predictions_df['Predictions'] = mlb.inverse_transform(predictions)
  predictions_df['Predictions'] = [', '.join(map(str, l)) for l in predictions_df['Predictions']]
  predictions_df['Predictions'].replace('', 'Not Categorized', inplace=True)

  new_data = new_data.join(predictions_df)
  return(new_data)

input_df = predict_labels(input_df)
input_df.to_csv(path + 'results_getpreds.csv',index=False)
0 Likes

(Martin Bai) #6

which version of fastai you are using? As I am pretty sure there is no ‘ordered’ argument anymore in my case. Therefore, might be some differences.

0 Likes

(Matt Kalebic) #7

I’m using the most recent v1.0.59 (2019-10-26) - the ordered argument still exists for the text.learner version of get_preds() but otherwise yeah it’s removed elsewhere.

0 Likes

#8

The padding issue has been addressed for v2 already, it will stay as it is in v1.

0 Likes

(Martin Bai) #9

Do you have any ideas regarding the odd pattern that only half CPUs are used in inference? Is that because CPUs have to be split evenly for batch preparation and predictions?

0 Likes

(Matt Kalebic) #10

Thanks for the update @sgugger! Awesome to hear. When are you targeting to release v2?

0 Likes

#11

When it’s ready :slight_smile:

1 Like