APL Statistics via Bootstrapping

I created an initial blog post on calculating some basic statistics (confidence intervals and P values) in APL using bootstrapping. Take a look and let me know what you think!

And definitely let me know if I made any errors or if there are better ways to do anything included :slight_smile:

I seem to have to reshape arrays more than I think I should have to, and not sure if that’s just something to get used to or if I am not properly using some of these symbols.

https://isaac-flath.github.io/APL-Exploration/posts/Basic%20Stats.html

1 Like

Thanks for sharing this! I don’t know who the intended audience is, but as a beginner, I’d really appreciate some explanation of not just the “what is happening” which you do include, but also “how it is happening”.

For example, I can (sort of) make sense of the first statement:

⍝ Create some data for us to sample
⎕←V ← 5?10

as

Draw 5 random numbers from 1 to 10 and assign to V and also display to screen.

But, for the next one, things get tricky for me:

⍝ Get random sample of indexes
⎕←S←?10/⍴V

So I tried to work through it, but not knowing where the logical boundaries are, I just went from right to left.

 `⍴V` just gives me 5 which means it must be just the length of the V vector.
`/⍴V` gives me error ... which tells me I need to have something to the left
`10/⍴V` gives me a 10 item vector filled with 5's 
`?10 /⍴V` gives me a 10 item vector with random values between 1 to 5

But then the question is: Where are the logical boundaries? are ?10 and ⍴V LHS and RHS to operator (or function) “/” ?

So, as you can see, not knowing ‘trains’ and precedence rules for all the operators involved (or even not knowing whether the thing I’m looking at is an operator or function, and in what role it is being used,) it’s slow going for newcomers.

If there were some basic comments on not just the “what’s happening” but also “how it’s happening and why it’s happening the way it’s happening?” , it would help immensely towards understanding, but then again, it all depends on the audience, and if it’s not meant for the newbs, then that’s fine too, please ignore.

2 Likes

BTW, your posts are extremely helpful in the sense that I always find them approachable and feel I can try to work through them as you build up to the solution, so, this process of trying to figure things out presented in small chunks is quite illuminating. Thanks for sharing your expertise via your blog.

1 Like

I will work on adding more prose about the symbols. In the meantime, I would think and build up to understand ?10/⍴V in this way ( broken out 1 symbol at a time)

  • ⍴V gives the length of the vector. Let’s say it’s a vector with 5 numbers in it. Then ⍴V is 5.
  • 10/5 gives me the number 5 repeated 10 times. So 10/5 is 5 5 5 5 5 5 5 5 5 5
    ?5 gives me a random number between 1 and 5. ?5 5 gives me 2 random numbes, each between 1 and 5.

We can then put all of it together by adding in parenthesis into our function and print out intermediate values as it is calculated to see the above 3 steps get calculated in real time. ⎕←?(⎕←10/(⎕←⍴V))

One thing that could be causing confusion is that 5?10 is a dyadic example of ?, in where it is sampling without replacement. I cannot do 15?10 for example because there’s not 15 numbers to draw between 1 and 10. However in the second example ?10/⍴V we are using the monadic ?. In it’s monadic form it’s just giving us a random number rather than grabbing a random sample. So in these 2 examples the ? are not the same function.

1 Like

Hmm ok, now I’m a little bit more confused re: monadic/dyadic ? function :joy: but it’s all good! more for me to chew on.

BTW, I’m just working through these lines and trying to understand the get_sample function.

I was wondering if I could create get_sample2 such that

get_samples2 ← { (⊂ ?⍺/⍴⍵) ⌷ ⍵}

since I don’t know how to “set seed” I can’t create reproducible rands but it seems to be acting and behaving like get_sample you have.

I’m just wondering if this is equivalent? ie; without using the operator? Was the used for “purity” sake? ie it’s more APL’ish to use that operator rather than the way I constructed the function get _samples2 ?

I can think of it in 2 ways. First starting right to left and thinking through each symbol 1 at a time.

  • V is a vector. Nothing to do since it’s not a function or operator. It’s just a value.
  • is a mondadic function applied to the vector. We know it’s not an operator, so it can’t take a function as an argument. We also know it cannot be a dyadic function because there is only 1 value passed to it (V). The left slot is a function which can only be accepted as an arguement by operators. Therefore this MUST be monadic with V as it’s argument
  • / must be dyadic. It’s a function and it has a number on the left (10), and a number on the right (⍴V). It can’t be an operator because there’s no function around it (⍴V has already been applied and turned into 5 in the previous step). And 2 arguments are passed to it (1 on the left and 1 on the right) so it must be dyadic).
  • ? Must be monadic. It’s not an operator so it can’t take a function as argument. Therefore it can only take the stuff to it’s right as an argument.

Yes that’s equivalent. The removed the need to have parenthesis. I don’t know what APL’ish is as I am new to APL myself. That said, I generally focus on doing what I think makes the most sense to me and don’t worry about conforming to whatever norm exists. There are many very helpful norms and there are many very nonsensical norms. I find it best to figure out what works for me and do that, and air on the side of following the norms when I don’t have a strong opinion either way.

So far I have found I generally like removing parenthesis and letting as much as possible be handled with normal APL precedence so I always read through the code in the same way. I think removing them at times is convenient for that reason. On the same note, I don’t believe that they should be removed always. I don’t know where the line is for me yet as I am still new to APL. Right now I air on the side of removing them because that’s what I am less familiar with and figure I should get comfortable with both approaches. Of course I may change my opinions on this in the future as I learn APL more!

1 Like

Also random seeds are described https://help.dyalog.com/18.2/#Language/System%20Functions/rl.htm#RandomLink:

For example:

Screen Shot 2022-07-20 at 6.07.35 PM

1 Like

Ok, so should it be read as follows?

?10/⍴V

     ↓

?10 , / , 5

     ↓

Roll a 10 sided die, replicate , 5 times

Not quite. 10/5 must happen first. Try figuring out what these lines are doing that show both options.

      ⎕←replicateFirst ← 10/5
5 5 5 5 5 5 5 5 5 5
      ?replicateFirst
5 1 4 2 5 5 2 2 1 2
      ⎕←rollFirst ← ?10
7
      rollFirst/10
10 10 10 10 10 10 10
2 Likes

FYI, putting two verbs next to each other is the same as using jot between verbs (except that in some cases you’ll need parentheses for precedence).

OMG! I was way off! :smiley: (this is after “reading” the help pages on all these operators btw !!)

Ok, so

Roll the dice up to each number in this array -> (10 times repeat the number 5 (ie; length(V)), put it in an array) is more akin to what’s happening here.

Thanks for your patience, I honestly don’t think I’ll ever be able to learn APL enough to use it in any meaningful way tbh. It’s kind of a depressing thought, because I can feel how powerful this language really is.

I think whether or not you learn it enough to use it in a meaningful way is entirely up to you. It takes working through stuff like with with consistency and persistence over a long period of time.

That said, there are a lot of really powerful things to learn. Most of which you will never learn because there’s just too much out there to be learned. If APL isn’t your cup of tea there are tons of other equally important, productive, and powerful things you can do instead! Nothing to beat yourself up over.

But I think anyone can learn APL, it’s just about the (hopefully enjoyable) grind.

1 Like

Truth be told I’m a sucker for punishment and I’m drawn towards these types of languages (but maybe not for long enough it seems.) I’ve spent my share of time on Forth-like and Lisp like languages , just for the fun of it. But you are right , it takes deliberate practice and patiences to get good at this. APL presents its own special set of challenges.

In a language like Forth you can sort of muddle your way through, because even though it’s sort of weird with having to deal with the stack but it’s still approachable; but APL family seems unforgiving because you must have your concepts very clear before you even do basic things.

Nonetheless, I’m working my way through a tutorial on the Dyalog site, I think it’s better to finish that rather than attempting to understand more involved concepts like the ones in the Sampling/Bootstrapping post. It’ll come in due time I’m sure.

Makes sense.

I only wrote this post because I was trying to write one on decision trees and got stuck and couldn’t figure it out. I decided to go write something else instead, and did this statistics post.

And then when I was writing this one I got stuck and had to ask my brother how to look up values in a vector when writing it.

So none of this is coming naturally to me either.

1 Like

Hi - nice article, there are a couple of things I’d change.

sampling_distribution ← n ss ⍴ ∊ (ss←10) get_sample¨ (n←1000)⍴⊂data

This can be sampling_distribution←data[? (n←1000) (ss←10) ⍴ ≢ data]

It can be tempting to create standalone functions for stuff like get_sample, but often you end up doing things multiple times in roundabout ways compared to doing it more directly. It’s also faster, so win-win really.

Also later on you use 10 instead of ss, and for clarity I’d use ss whenever you’re referring to sample size. (Also the same with calculating the confidence intervals)

Hello @rak1507 , could you kindly explain what this line is doing?

Thanks!

It creates a 1000 by 10 matrix of ≢data, runs ? to get a random index, and then indexes into data.

1 Like

Hi Rak. This is great feedback, as well as the feedback you gave over discord. I knew that there had to be a better way than than what I had done but I could not figure it out what that would be. This is really helpful :smiley:

Min/Max

I made a couple of changes. On discord you pointed out that taking a min/max of a sorted vector like I was (ie ⌈/(⊂order90)⌷sample_means) is wasteful because I could just take the first and last value because it’s already sorted. I used left and right tack for this and used the bracket notation you showed in your sampling distribution advice.

sample_means[⊣/order90]
sample_means[⊢/order90]

Sampling Distribution

I made the change you recommended. I knew there had to be a better way than what i had done but couldn’t figure out what that would be. Changed this in all 3 spots.

My understanding of how this works(mostly to solidify my own knowledge).

  • ≢ data: Gives me the number of data points available to sample from
  • (n←1000) (ss←10) ⍴: Is reshaping to create the 1000x10 shaped array. The value from ≢ data is broadcast to fill that shape.
  • ?: Then ? is being applied to each number in the roll to give a number between 1 and ≢ data (because ≢ data was broadcast to the whole array)

That leaves me with a big matrix of index locations, and I can simple index into the array with the values with brackets.

Variable Consistency

Removed quite a few times where I was putting in numbers where I had already defined a variable with that number in it that you also mentioned.

Finally fixed our font

Thanks for the code I copied from Jeremy’s repo, my font is now fancy APL font!

1 Like

You need to grab my styles.css too, and then re-run your docs build, to make the outputs nice :slight_smile: