Help: The data block API in a swifty way

We’re looking for help from experimented Swifft users to give us some ideas on the design of the data block API. In Python, it’s done in very Oriented-Object way that ends up being a mess of generic types in swift, so we think there is a better way, probably using some more functional style.

But you may not know the data block API (but you should, it’s great, I promise you!) so I’ll briefly summarize what we need. At its core we have items which can be images, texts or plain tensors. We will need to apply a chain a function to them:

  • some of which are applied lazily when we access the item (open the image because we don’t want the whole dataset in memory, apply random transforms for data augmentation) which I’ll call lazy init and transforms
  • some of which are applied on the whole dataset at creation (tokenize texts, change categories to their numerical index) which I will call processors

Then the basic idea of the data block API is you can mix and match functions (or blocks if you prefer) to execute the following tasks:

  1. gather your inputs
  2. split them between a training and a validation set
  3. label them with any type of labels (note that labels will also need to be applied the same lazy init/transform/processors as they could be images/texts…)
  4. apply all the processors
  5. when querying for an index, apply the lazy init and transforms, then return the input and its label.

The end result can be seen in notebook 11 where we have the code:

//step 1: gather all inputs in the folder path
let il = ItemList(fromFolder: path, extensions: ["jpeg", "jpg"])
//step 2: split by grandparent folder
let sd = SplitData(il) {grandParentSplitter(fName: $0, valid: "val")}
//step 3-4: label and apply all the processors
var procLabel = CategoryProcessor()
let sld = makeLabeledData(sd, fromFunc: parentLabeler, procLabel: &procLabel)

Up until this point, images have never been opened, and it’s only when we ask for one that it gets decoded and applied some transforms (in notebook 11 we use for that but we would like to replace it by openCV)


You may want to take a look at my SwiftAI project which I posted about ( a week ago. You can find the project here You may find some things that you like or maybe you will hate it all, but it might give you some ideas. When I started working on this I looked at the DataBlocks you had created and came to the same conclusion you have which is it can be tough to try to port it directly as is into Swift. Anyway, I’m planning to do a little bit of a rewrite of what I have but the basic gist is that I’ve created a DatasetBuilder. This builds the various datasets (train, valid, test). Through extensions you can load the inputs (from folder paths, from CSV, Coco, etc), then split them, etc. New functionality is added via extensions. The functionality you want the builder to have is added via functions which add captured closures which are later executed when you call the build method. Each time the build function is called the datasets are recreated allowing for re-use. Functions can be added or subtracted to the builder via the builderId (shown in second example) so that there can be a default builder for certain types of projects and then tweaked as needed via the builderIds. Below are 2 examples. The first for classification and the second for bounding boxes. The datasets are what I like to call the logical model. That is they don’t actually load the images and perform the augmentations. They just now have for example the file paths and Y values.

    let datasetBuilder = DatasetBuilder<URL, String>()
        .withFilesFromFolders(parentFolder: folder, extensions: extensions, trainFolder: trainFolder,
                              validFolder: validFolder, testFolder: testFolder)
        .withFileLabelsOfParentFolder(includeTest: testHasLabels)
        .withSample(of: .Train, pct: trainingPct, fixed: fixedSamples)
        .withSample(of: .Valid, pct: validPct, fixed: fixedSamples)
        .withSample(of: .Test, pct: testPct, fixed: fixedSamples)

    if !testHasLabels {
        datasetBuilder.withY(classes[0], type: .Test)

  let datasetBuilder = DatasetBuilder<URL, Y>()
      .withCocoJson(builderId: "TrainCocoJson", atPath: "\(folder)/\(trainCocoJson)", imagesFolder: "\(folder)/\(trainFolder)",
                    largestBBoxOnly: largestBBoxOnly)
      .withCocoJson(builderId: "ValidCocoJson", atPath: "\(folder)/\(validCocoJson)", imagesFolder: "\(folder)/\(validFolder)", type: .Valid,
                    largestBBoxOnly: largestBBoxOnly)
      .withCocoJson(builderId: "TestCocoJson", atPath: "\(folder)/\(testCocoJson)", imagesFolder: "\(folder)/\(testFolder)", type: .Test,
                    largestBBoxOnly: largestBBoxOnly)
      .withSample(of: .Train, pct: trainingPct, fixed: fixedSamples)
      .withSample(of: .Valid, pct: validPct, fixed: fixedSamples)
      .withSample(of: .Test, pct: testPct, fixed: fixedSamples)

Next come the transforms (badly named; planning on changing it to pipeline or something like that) but these are what, for example, load the images, normalize them, flip them, convert them to tensors, etc. Presently they do this one image at a time, but am going to change them to do a batch at a time. Below is an example, that opens the image, resizes it, converts the PIL image to tensor, normalizes it, flips it and one hot encodes the Y value. The second one is for bounding boxes. I’m planning to make this more like the DatasetBuilder by using closures instead of classes. These “transforms” (what you call processors) are applied by DataLoader when you ask for a batch. I call this the physical model.

            OpenImage(type: imageType),
            ResizeImage(size: imageSize),
            Normalize(divisor: nil, mean: imageMean, std: imageStd),
            Flip(type: .Horizontal),
            ClassLabelToInt(classes: classes)

       Transforms<URL, Y, PythonObject, PythonObject>([
            OpenImage(type: imageType),
            ResizeImage(size: imageSize),
            Flip(type: .Horizontal),
            Normalize(divisor: nil, mean: imageMean, std: imageStd)

So the DataSet builder loads the “logical” model of your data whereas the DataLoader loads the “physical” model of the data. This allows easy access to both sets and makes matching up the two quite easy for example when running test and getting nice output. Since I couldn’t use PyTorch’s DataLoader stuff because no current easy way to have Python call Swift and because PyTorch isn’t great at multi-threading, I also implemented my own BatchSamplers (RandomSample, SequentialSampler, etc.)

Then, what I’ve done is to use the Template pattern to provide defaults for most things. I call the top level the “Project” for example VisionProject which calls various methods to create a Learner. See example below from VisionProject class. Most of these methods are actually implemented by subclasses or a default is provided which can be overridden. For example, the DatasetBuilders shown above are returned when getDatasetBuilder() (shown below) is called for a single classification project or bounding box project. And the transforms shown above are returned for the getTransforms() (shown below) is called.

open func learner() -> Learner<X,Y,U,V> {
    let datasetBuilder = getDatasetBuilder()
    let datasets =
    if classes.count == 0 {
      classes = datasetBuilder.classes!
    let transforms = getTransforms()
    let dataLoaders = DataLoaderSet(datasets: datasets, bs: batchSize, transforms: transforms)
    let model = getSavedModel()
    let lossFunc = getLossFunc()
    let testModel = getTestModel(forModel: model)
    let callbacks = getCallbacks(forModel: model)
    let testCallback = getTestCallback()

    printSummary(dataLoaders: dataLoaders)

    return Learner(dataLoaders: dataLoaders, model: model, lossFunc: lossFunc, optimizer: optimizer,
                   learningRate: learningRate, callbacks: callbacks, testModel: testModel,
                   testCallback: testCallback)

What this boils down to is that a Pascal bounding box project can be defined something like this

public class Pascal : LargestBBoxODVP {

    public override init() {

        folder = "./data/pascal"
        validFolder = "train"
        trainCocoJson = "pascal_train2007.json"
        validCocoJson = "pascal_val2007.json"
        testCocoJson = "pascal_test2007.json"
        savedModelPath = "./pascal.pth"


Or if some tweaking is desired some of the defaults can be overridden and modified like for DogsCats. In this example, the default dataset builder is being modified by having the FileLabelsOfParentFolder removed and FileLabelsFromFilename added. Also a fixed split is being done with 20% of files being moved from the training to the validation set.

public class DogsCats : SingleClassICVP {

    public var testResultsFilePath = "./dogs-vs-cats-redux-kaggle-submission.csv"

    public override init() {

        folder = "./data/dogscats"
        classes = ["cat","dog"]
        savedModelPath = "./dogs-cats-model.pth"

    override open func getDatasetBuilder() -> DatasetBuilder<URL, String> {
        let datasetBuilder = super.getDatasetBuilder()
        let at = datasetBuilder.indexOf(builderId: "FileLabelsOfParentFolder")

        return datasetBuilder.without(builderId: "FileLabelsOfParentFolder")
            .withFileLabelsFromFilename(at: at) { String($0.prefix { $0 != "." }) }
            .withFixedSplit(from: .Train, to: .Valid, pct: 0.2)

    override public func getTestCallback() -> TestCallback<URL,String> {
        return DogsCatsReduxKaggleCallback(classes: classes, testResultsFilePath: testResultsFilePath)


Would it be an idea to decouple loading the attributes (name of categories, location of images, other, attributes) from how to use them (splitting, processing bounding boxes, transform) by loading the attributes into a SQLite db (in-memory) ?

Not sure how relevant it is for the question discussed. But talking from the syntactical point of view, one could take into account the Swift’s capability to introduce new operators and create something like:

var procLabel = CategoryProcessor()
let dataBunch = (
    ItemList(from: folder, extensions: ["jpeg", "jpg"]) >>>
    SplitData {grandParentSplitter(fName: $0), valid: "val" } >>>
    makeLabelledData(from: parentLabeler, procLabel: procLabel))

To replicate a bit behavior from R or functional languages. (Thought probably it will be less flexible then “stateful” methods).

I’ve used this approach to convert fonts:

public typealias Converter<A> = (A) -> A

infix operator >>>: AdditionPrecedence

public func >>> <A> (f1: @escaping Converter<A>, f2: @escaping Converter<A>) -> Converter<A> {
    return { item in f2(f1(item)) }

public func >>> <UIFont> (font: UIFont, converter: Converter<UIFont>) -> UIFont {
    return converter(font)

public func newFont(_ font: UIFont, with traits: UIFontDescriptorSymbolicTraits) -> UIFont {
    guard let descriptor = font.fontDescriptor.withSymbolicTraits(traits) else {
        fatalError("Cannot build font descriptor with traits \(traits)")
    return UIFont(descriptor: descriptor, size: font.pointSize)

public func newFont(_ font: UIFont, withSize size: CGFloat) -> UIFont {
    let descriptor = font.fontDescriptor
    let newFont = UIFont(descriptor: descriptor, size: size)
    return newFont

public func italic(_ font: UIFont) -> UIFont { return newFont(font, with: .traitItalic) }
public func bold(_ font: UIFont) -> UIFont { return newFont(font, with: .traitBold) }
public func shallow(_ font: UIFont) -> UIFont { return newFont(font, withSize: 12)}
public func small(_ font: UIFont) -> UIFont { return newFont(font, withSize: 16) }
public func medium(_ font: UIFont) -> UIFont { return newFont(font, withSize: 18) }
public func large(_ font: UIFont) -> UIFont { return newFont(font, withSize: 20) }

So you can use something like:

var someFont: UIFont = .system >>> large >>> bold

The major idea here is to define some protocols and “combinators” to compose these things together. Not sure if this apporach could be somehow adapted for data blocks and solve issue with bunch of generic types.

The inspiration here comes from languages like Haskell with pretty powerful generics and its strict type system. Probably one can borrow some ideas from there to make the code easy to write and to understand.

1 Like

The custom infix operator is something that I’ve found to work well when describing processing pipelines, or even directed acyclic graphs in general. For example, I defined an infix operator for --> here:

infix operator --> : AdditionPrecedence
@discardableResult public func --><T:ImageConsumer>(source:ImageSource, destination:T) -> T {
    return destination

that can be used to construct an image processing pipeline (seen in the examples here ) like the following:

camera --> gaussianBlur --> sobelEdgeDetection --> renderView

where image frames then will flow from the camera source, through the pipeline, to the visualization output. While I like to avoid going overboard on custom operators, this is a case where people really seem to like the clarity this provides.

With the values-as-functions Swift evolution proposal, I’ve even wondered if you could use something like this to describe the Swift for TensorFlow functional chaining within a model.


For folks interested in “Haskell-ish” Swift programming I highly recommend this series:

It’s kinda mind-blowing :slight_smile:


@clattner let us know if you have any thoughts on this.

+1 for gpuimage, have used it to do some interesting tricks in the past. composing data/image operations as a collection of operators is a powerful conceptual model.

Me too. For example, an older open source project of mine let you define neural networks in this manner:

let input = Input()

let output = input
        --> Resize(width: 28, height: 28)
        --> Convolution(kernel: (5, 5), channels: 20, activation: relu, name: "conv1")
        --> MaxPooling(kernel: (2, 2), stride: (2, 2))
        --> Convolution(kernel: (5, 5), channels: 50, activation: relu, name: "conv2")
        --> MaxPooling(kernel: (2, 2), stride: (2, 2))
        --> Dense(neurons: 320, activation: relu, name: "fc1")
        --> Dense(neurons: 10, name: "fc2")
        --> Softmax()

let model = Model(input: input, output: output)

Something like Convolution(...) returns a layer object and --> turns it into a tensor object (input and output are also tensors). This is a nice alternative for Python’s ability to call class instances / the Keras functional API.

That looks great @machinethink - similar to how some Julia nets are written. How did you avoid the need to provide the number of input channels to each layer?

The same way that Keras does this. :smiley: This is very much based on the Keras functional API but using Swift syntax. Because Swift doesn’t have a __call__, the --> operator is used for that purpose.

The trick is to make a distinction here between layers and tensors. When you write something --> Convolution(...), the something is a tensor. The Convolution layer reads the number of input channels from that tensor. The output of this --> operation is a new tensor with the same number of channels as the Convolution layer has filters.

So what happens is really:

let resizedTensor = input --> Resize(width: 28, height: 28)

let conv1Tensor = resizedTensor --> Convolution(kernel: (5, 5), channels: 20, activation: relu, name: "conv1")

let pool1Tensor = conv1Tensor --> MaxPooling(kernel: (2, 2), stride: (2, 2))

...and so on...

In fact, this is valid syntax and it’s how you would create more complicated graphs such as branches etc.

If Swift would support __call__, then the code would look like the following, which is exactly what the Keras functional API looks like:

let resizedTensor = Resize(width: 28, height: 28)(input)

let conv1Tensor = Convolution(kernel: (5, 5), channels: 20, activation: relu, name: "conv1")(resizedTensor)

let pool1Tensor = MaxPooling(kernel: (2, 2), stride: (2, 2))(conv1Tensor)

My project is not using S4TF but under the hood it uses MPSCNN, the iOS deep learning library. That’s why these “tensors” aren’t really things that hold the computed activations, so maybe “tensor descriptor” would be a better term. The object just describes how large the data is that goes between two layers, but isn’t really used for anything else (because MPSCNN does things a little differently than TF). In Keras it is an actual TF tensor object.


But in keras you need to define build to make this work - i.e a separate method that returns the shape of the new tensor, given an input shape.

Yes, that is still required here. The code snippet I showed only builds a graph that connects layers through “tensors”. This graph doesn’t do anything yet.

At some point you have to call compile(), which goes through the graph and figures out how big the tensors are and actually allocates everything needed to run the layers. The layer objects indeed have something like this build() method that knows how big the output tensor is (taking into consideration the input tensor but also padding, strides, etc).

The above approach probably wouldn’t work directly with S4TF objects, so you’d need to create wrappers that allocate the real S4TF layers inside this build() method.

Not to take away from your general point, but Swift does have __call__, called func call. It is in the late stages of standardization and already in the S4TF compiler. The design is converging to the func call syntax (which is in the compiler now) but needs a last round of community review on the language design and bikeshed on the word ‘call’.



I got a start on re-writing the DataBlock API in a more Swifty/functional style, just to show what that might look like – or at least, showing how to do the work in plain Swift in a functional style.

I got all the way up to the the SplitLabeledData type, and I think there’s a pretty clear way forward but sadly I don’t have more time to do more on it right now. (Annoying…)

Basically, my plan was:

  • try to separate data set config info and generic manipulators more clearly.
  • remove speculative generics
  • use Swift’s built-in data manipulators like map and partition(by:) when possible
  • lean on higher-order functions and function composition, to emphasize data flow.
  • somewhat eccentrically, use tuples instead of named structs

I realize the last point is not particularly idiomatic but I think this was an interesting way to focus on “just the data” before getting too caught up with naming things.

Anyway, here’s the work so far if anyone is curious:


This is really neat! After a first pass through it, it looks to me that the only specific assumption is actually the ImageNette type name itself. Do you have any thoughts about generalizing it in the future, maybe by turning it into a protocol that other data collections need to conform to?

I would want to complete the rest of the pipeline, all the way to into the TensorFlow Dataset types, in a concrete way, before thinking too hard about how to generalize.

That said…, I suspect the tuples would go to structs, and instead of using protocols constraints to align different types, you could just rely on pure functions and on types matching at the boundary of function signature or the function return type. That way you can be generic without creating tight coupling between generic types.

I think it’s probably a flaw in the current design that as you move down the pipeline the data moves into progressively more complex types with more dependencies on the previous types. That probably is not necessary.

I also think one thing we’re bumping into here is the absence of a Dataframe library in Swift. This would be a fun project. F# or maybe Scala are probably good comparables. And the Clojure build tool boot might also have some relevant design ideas.

1 Like

What if I told you @sgugger has finished doing that, and made it generic already? :slight_smile: See 08c_data_block-lightlyfunctional.ipynb . Would love to hear your thoughts.


One of the best part of actual python data block api is it’s expressiveness:

databunch = itemsFromWhatever(...)

That gives you the perception to control anything.
In swift I would expect something like this:

databunch = items
      >| splitWithWhatever(...) // -> TrainTestSplitted
      >| labelWithWhatever(...) // -> TrainTestLabeled
      >| addTransform(tfms) // -> TrainTestLabeledWithTfms
      >| optionallyAddTest(...) // -> TrainTestLabeledWithTfmsAndTest
      >| databunch(...)

With the big advantage that all of these are (possibly pure) functions!

1 Like

Wha?! It’s a thing of beauty! :heart_eyes:

1 Like