Port fastai-make-pr-branch from bash to python


(Stas Bekman) #1

Project: Port fastai-make-pr-branch to python

To support windows users the currently bash implementation of fastai-make-pr-branch needs to be ported to python.

Requirements: You should have knowledge of git and understand what the current bash script does and of course to have a sufficient python knowledge to do the porting.

The only extra is to make the command line arguments named, which can be implemented easily with argparse. Look at other tools under tools to borrow the main functionality of calling out to external processes and parsing arguments.


Dev Projects Index
How to contribute to fastai [Discussion]
(Ilia) #2

Would be glad to help with this thing as soon as would like to help in library’s development if I can. However, my git knowledge is limited to making new branches, doing forks/PRs, and other basic stuff from the developer’s everyday work :smile:

I have experience with Python and worked a lot with various CLI building packages (argparse, docopt, click, fire, etc.). I work in macOS/ubuntu environments mostly, so familiar with basic bash syntax as well but I have a Parallels-based Windows machine as well.

Talking about porting to Python. Do you think that we should use subprocess calls to interoperate with Git, or is there some programmatical interface/package? Talking about GitHub, I know they expose some API, and there is a third-package on top of it.


(Stas Bekman) #3

Thanks for your interest, @devforfu.

All the logic has already been implemented and works correctly, so it’s mostly replacing bash calls with subprocess calls, keeping the exact git commands and checking success/failure or specific outputs. So based on your description of your experience you shouldn’t have any difficulties there. You can reuse tools/run-after-git-clone as the foundation.

The github api is to work with github-specific functions, which is not what we need, it’s plain git that we want. I’m not sure whether there is a python library for git.

And of course there will be some refactoring to do.

Finally, the logic of fastai-make-pr-branch is also documented here.


(Ilia) #4

Ok got it! Thank you for clarification.

Yeah, you’re right, I know that GitHub API is mostly to retrieve repositories info but wasn’t sure if it provides authentication capabilities or something in addition to manage repos.

Yes, agree, the bash script already contains desired functionality. I’ll review the code and post here required links for your review as soon as have some progress.


(Stas Bekman) #5

Excellent! Just remember to claim the project here, by putting your username in the Volunteer column.


(Jeremy Howard (Admin)) #6

Maybe better to use gitpython instead of subprocess calls?

https://gitpython.readthedocs.io/en/stable/tutorial.html#tutorial-label


(Stas Bekman) #7

git-python is not on conda. I guess since it’s an independent tool - it could be ok. But if down the road we decide to prepackage this and other tools with the fastai package, it’ll be an issue with dependencies. Thoughts?


(Ilia) #8

It is possible to make a fork of this package and move it to conda? Or contact with maintainers to bring the package into one of conda channels?


(Stas Bekman) #9

Everything is possible, but sounds like an overkill, given the extra complication - subprocess works just fine - speed is of no essence here, and would you like to have the extra ongoing responsibility of maintaining that library if you put it on conda? Moreover, it looks like windows support is incomplete, which may disqualify this whole package altogether - except it’s hard to tell what doesn’t work.

Of course, feel free to investigate and I trust you will find the best solution out there.


(Ilia) #10

Yes, agree, I’ll try to keep this thing simple and equivalent to the bash script.

FYI, here is a link to the forked branch where I am working on the tool’s reimplementation.


(Stas Bekman) #11

looking good, @devforfu.

It’ll probably be a good idea to refactor most of the shell calls into a wrapper, once we know for example that git works, some good wrappers can be copied from: https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py


(Ilia) #12

Almost done with re-writing from Bash into Python, copied a couple of functions from the repo you’ve referenced above. Very helpful tools :smile: So now going to test the re-written version to make sure that nothing was broken, and probably refactor a bit.


(Stas Bekman) #13

When you get a chance let’s finish this project, @devforfu, yes? When you’re done let me know and we can do the testing then.


(Ilia) #14

@stas Sorry was a bit too distracted last few weeks fighting with multiprocessing memory leaks. Now when the competition is finished I can focus on the project and finalize it within a couple of days. It is already ported from bash into Python but I’ve never run the code to check its correctness. So I am going to do some preliminary checks and then we can start testing.


(Stas Bekman) #15

Excellent, thank you, @devforfu!


(Ilia) #16

@stas Not a problem! Finally, I’ve created a PR with the script. I am still testing it locally so probably there are some edge cases that weren’t proven to work. Nevertheless, I believe the code is ready for a review, and I’ll do required fixes if I’ve missed something :smile:


(Stas Bekman) #17

Fantastic - I merged it and we can tweak if needed from there.

Great work, @devforfu and thank you for your contribution.

Probably now we need to test it on windows.

any takers on windows?

use fastprogress for testing as it’s a much smaller download.

git pull
python tools/fastai-make-pr-branch-py ssh my-github-username fastprogress new-feature-branch

use ‘https’ instead of ‘ssh’ if you don’t have ssh setup with github. and see help for complete details:

python tools/fastai-make-pr-branch-py -h

(benedikt herudek) #18

this is one super helpful script folks! thx @devforfu @stas


(Stas Bekman) #19

@devforfu, I guess you haven’t tested the case where the user doesn’t have a fork or perhaps you have a token setup. Your implementation isn’t taking care of password prompt when this is run:

running: curl -u stas00 https://api.github.com/repos/user/repo/forks -d ''

It just sits there forever…

The bash implementation has no problem, since it transparently handles the interaction part, but in python you actually need to read from the pipe and paste to it the password the user types in. But currently it doesn’t even show:

Enter host password for user 'stas00':

Mind to take a look when you will have free time?

You can reproduce this by creating a fork of something you haven’t forked yet on github, or deleting the existing one.

Thank you.


(Stas Bekman) #20

oh and in python 3 I think the decoding has to be utf-8 not ascii in run().

I’m more used to the run version:

def run_and_return_output(command):
    "Returns (return-code, stdout, stderr)."

    out, err = '', ''
    result = subprocess.run(command.split(), shell=False, check=False, stdout=PIPE, stderr=PIPE)
    if result.stdout: out = result.stdout.decode('utf-8')
    if result.stderr: err = result.stderr.decode('utf-8')
    rc = result.returncode
    return rc, out.strip(), err.strip()

except here we probably need shell=True, but your implementation is just fine, with just replacing it with ‘utf-8’.