What exactly is an architecture?

I’ve finished lesson 1 of the fastbook and I’m left confused on what exactly an architecture is.

Early on in the lesson, it states that an architecture is a working implementation of a model.

Then later on, the lesson states that an architecture is simply a template for a mathematical function, which is contrary to what was said earlier about an architecture being a working implementation (which doesn’t imply being a template).

Then after that, a formal definition is given:

The template of the model that we are trying to fit; the actual mathematical function that we are passing the input data and parameters to

And this formal definition is consistent to what was stated just before.

I suppose I’m being confused/misled by the initial statement on what an architecture is, which conflicts with what is said later.

I would appreciate if somebody could clarify what exactly an architecture is and if I should ignore the initial statement.

From my current understanding, the architecture is a template (the latter definition)where the weights are initialized as random, but it can also be pretrained (the former definition) where the weights have meaning since it was trained before on a different data set. You can either train the nonpretrained architecture from scratch or fine-tune the pretrained model for your specific task.

When you define a Learner object, there’s a parameter for pretrained. I believe that would toggle the architecture between the two definitions.

Thanks for the explanation! Makes more sense now.

I suppose then pretrained architectures are still templates (that are functional models at the same time) with already [somewhat] optimized parameters.