Port fastai-make-pr-branch from bash to python


(Stas Bekman) #21

I was thinking - perhaps we make this script generic and bear with typing an extra ‘fastai’ in the command line? that way we can use it for any github repo. Or even better we make proper named arguments, and then fastai can be one of the defaults :wink:

I started generalizing it: https://github.com/stas00/git-tools/blob/master/github-make-pr-branch.py

just need to get the auth sorted out in the forking stage, once you get a chance to come around it.


(Ilia) #22

@stas Oh, you’re right, makes sense. Sure, I’ll try to look into this issue during this week (or weekend).

Great! Yes, agree, I was intentionally making the script very similar to the bash version so we get the same expected behavior but nothing prevents us from making it more flexible :smile:


(Stas Bekman) #23

ping?


(Ilia) #24

@stas Sorry for a late response! These weeks I have a bit lost among boosted trees:sweat_smile:

Yeah, you right, there is a problem with curl when it asks you to enter the password. I guess that I have a GitHub token configured because can’t remember when it asked me to enter the password last time :smile:

As you’ve correctly pointed out, this call hangs:

out, err = p.communicate()

As soon as communicate waits for the process’s termination. I believe we have a couple of possible solutions.

  1. Explicitly ask a user for a password every time when curl is executed using input() prompt and inject this into curl command like:
curl -u {username}:{password} ...
  1. Use pexpect or something similar to communicate with the curl process, wait for a prompt and send user’s response.

I guess the first approach is the most simple one. However, I think that we can have a similar issue with, for example, git tool. If a user hasn’t configured the password and username properly, I guess it also could ask for credentials on push.

How do you think it would be better to proceed here? Would it be enough to just ask for a password and inject it into curl invoking command?


Essentially, the first solution boils down to something like:

password = input('Please enter your password to authenticate with git:')
password = password.strip()
run(f"curl -u {user}:{password} https://api.github.com/repos/{orig}/{repo}/forks -d ''")

(Stas Bekman) #25
  1. Explicitly ask a user for a password every time when curl is executed using input() prompt and inject this into curl command like:
curl -u {username}:{password} ...

[…]
I think that we can have a similar issue with, for example, git tool

Ah yes, if they use https instead of ssh, then yes they will need to provide their password too!

And we have two cases:

  1. user uses https - ask for password right away at the beginning of the script - albeit the password is cached and can be configured to be cached forever
  2. user uses ssh - ask for password only if we need to fork, i.e. at the point of doing fork

But then if the user has a token configured we don’t want to bother the user with this. Can we reliable check the the token is correct?

Also trouble with passwords is that we don’t want to log those for security reasons, so that means that all areas where we log git/curl commands the password shouldn’t be logged.

  1. Use pexpect or something similar to communicate with the curl process, wait for a prompt and send user’s response.

I won’t suppose python has some magic way to just pass the control to the shell, as it happens automatically in the bash version of this script.

Based on our notes so far, it looks like approach (1) would be simpler, but will require a lot more whatif coding and hiding the password - perhaps your proposed approach (2) is then the most straightforward one?

But then in the case of https (instead of ssh) won’t the user need to enter the password twice? once for git and another time for curl’s fork. I guess it’s OK if the user is fine with constantly typing the password in anyway (i.e. should switch to ssh or do token).

There is also:


but it relies on a user having a github token.

To conclude I feel that your proposed direction (2) while more complex in its basics, it would be simpler overall, and we can encourage users to use ssh and token to minimize typing.

What do you think?


(benedikt herudek) #26

used the python script and it doesnt seem to set the upstream branch, see below. Using the shell script works fine.

fatal: The current branch test-scripts has no upstream branch.
To push the current branch and set the remote as upstream, use

_ git push --set-upstream origin test-scripts_


(Ilia) #27

@stas Yeah, right, in the case of the first solution, it will require a lot of conditionals. Another solution could be to ask for a password at the very beginning of the execution without making differences among various cases, explicitly inject into every command and hide from the output, but I guess it is not so good idea :smile:

I think that asking for the password twice is OK. As you correctly pointed out, unless one has a token or uses ssh method, one needs to type the password each time when doing something with git so probably asking for a password each time when it is required is not that bad.

Yes, agree, direct terminal automation seems to be more sophisticated but the most universal solution to account all possible cases. Alongside showing a notification to set up a token or switch to the ssh when https method is chosen.

@Benudek Oh, I see. Will check this soon!


(Stas Bekman) #28

no, git caches the password for 15min by default, so it’ll be at most one time password prompt from git. and one more time for forking if a token is not used. So worst case scenario - 2 times. But the script can cache it the first time and use it by itself the second time, so really we are down to just one prompt if ever.

And, of course, it needs to handle the failed password case too. Wow, suddenly it’s much more complicated than the bash script that just works. Perhaps you can find a ready implementation for that in the github land.

So the other option we could do is keep the bash script as the main product, and have the python script for windows users and require a token to be used with it? Look ma, no password needed! And if someone from the windows land wants the password option they can code it :wink:


(Ilia) #29

Yeah, it seems that bash gives a lot of these things for free :smiley:

Makes sense! I guess it is worth to investigate this question a bit to figure out if there are any out-of-the-box solutions to deal with passwords. But the idea to enforce token sounds reasonable :smile: So we can have some minimal solution for Windows and iteratively improve into something more sophisticated.


(Stas Bekman) #30

I know I’m not helping you to make a decision and instead providing you a way out of needing to do more work. So let’s just make a decision and have a resolution.

  1. If you have the resources and desire to make a complete auth solution - please let’s do it
  2. If not, let’s adjust the script to check that there is a valid token and if it not then fail and tell the user what to do.

And if it’s 2 we can then re-open this dev project for someone else to sort out the auth part.


(benedikt herudek) #31

good idea to get this call

pip install -e ".[dev]"

into the scripts? seems I always need to run and then forget it :wink:


(Ilia) #32

@stas Ok, makes sense! Let’s do as proposed and go with the (2) solution to get some working solution.


(Stas Bekman) #33

Perhaps with an optional flag. We shouldn’t be doing this by default - this is quite beyond the scope of this functionality and we don’t know whether a user wants it or not, whether they have activated the right env and it has effects on conda which won’t know this happened, you always need to conda uninstall fastai before you do that. Albeit, conda developers are trying to become pip-aware. But they aren’t quite there yet.

The easiest simple solution is to print an instruction to do the dev install as a reminder.


(Stas Bekman) #34

My feeling is that your priorities have changed and you probably have other more important things to worry about first, @devforfu. Which is great!

It has been several months with this project being in a stale situation.

So in order to be able to complete things, shall we make this project available for someone else to finish?

And if you’d like to finish it let’s do it soon.

Thank you.


(Ilia) #35

Hi @stas, sorry, you’re right. I’m a bit distracted now with Kaggle and preparing myself for the next part of the course. Let’s do the following. If I can’t finish the project during this weekend, please feel free to hand it to someone more capable to finish this sooner!

And I’ll be glad to help with this or any other dev project on the list as soon as get more free time.


(Ilia) #36

@stas Ok, here is my attempt to fix the issue with the token! :smiley:

Basically, I am just doing the following to communicate with GitHub without creds:

if args.auth == 'ssh':
    args.prefix = 'git@github.com:'
else:
    check_token()
    args.token = os.environ[ENV_VAR]
    args.prefix = f'https://{args.token}@github.com/'

And this, to run a curl command:

check_curl_is_available()
creds = user if args.auth == 'ssh' else f'{user}:{args.token}'
run(f"curl -u {creds} https://api.github.com/repos/{orig}/{repo}/forks -d ''")

Also, the notification is shown if GITHUB_TOKEN env variable is not set and the execution is terminated. Could you please check if my changes make sense? I’ve tested locally, and it seems to work more or less smoothly. (Though I am going to check some additional edge cases as well). Please let me know if we need to fix something else here. I’ll have time next days so we can make iterations faster without waiting for weeks :smile:

In addition, as you’ve previously proposed, probably we can modify the argparse CLI a bit to accept positional arguments with default values to make this tool more generic.


@Benudek I am not sure why the script wasn’t setting up the upstream. We have the following function in the snippet:

def set_upstream(branch):
    print(f"\n\n*** Set this {branch}'s upstream to simplify push and alikes")
    run(f'git push --set-upstream origin {branch}')

And when I was testing this, it has shown me the message:

*** Set this my-fancy-feature-branch's upstream to simplify push and alikes
running: git push --set-upstream origin my-fancy-feature-branch
Branch 'my-fancy-feature-branch' set up to track remote branch 'my-fancy-feature-branch' from 'origin'.

So it seems the --set-upstream is somewhat working. However, probably I just can’t reproduce the error you had, or maybe we’re running the snippet on different OS’es?


(benedikt herudek) #37

Hhm, no idea. I tend to use the shell script


(Ilia) #38

Yeah, so far bash does its work better. Probably we can test this thing more extensively soon to make it more robust.


(benedikt herudek) #39

This is done btw


(Stas Bekman) #40

Remember that a user may need to enter their ssh password too if they don’t use an ssh-agent, so in this scenario the script would get stuck again.

I will test it all once this last one is cleared out.

Note that we started migration to: https://github.com/fastai/git-tools
but haven’t removed the old scripts from the fastai repo yet.

So please make all future PRs into that new repo.

In addition, as you’ve previously proposed, probably we can modify the argparse CLI a bit to accept positional arguments with default values to make this tool more generic.

Yes, we really save typing an extra word with the current script, so probably it should be a generic github-make-pr-branch (w/ and w/o py), which would work with any github repo.

Then each user can copy the script, put into their PATH and edit it to their liking - defaults, etc.

But, given your free time is limited, let’s finish sorting out the password prompting first.