@jamestjw (github handle) trained an xresnet with SimpleSelfAttention on card suits and calculated the mean weights for each pixel on the N * N attention grid. I thought it was pretty neat. This is what it looks like on an example:
@jamestjw (github handle) trained an xresnet with SimpleSelfAttention on card suits and calculated the mean weights for each pixel on the N * N attention grid. I thought it was pretty neat. This is what it looks like on an example: