Why pad or truncate text in CNNs?

I observed that it is not very efficient to use RNNs for classification tasks (sentiment analysis), therefore people tend to use CNNs for this type of task. However, the input to a CNN must be of the same size, that`s why text passages are truncated or padded. Does it really work good? Has someone tried to run the text through an RNN, to get its representation (of the same size) and then fit this representation to a CNN, is there any paper?