What is the best way to understand the FastAI codebase to also improve general Python knowledge and become a better software engineer?

the_doctor · March 7, 2019, 5:32am

I would like to be able to -

Get better at Python (be a good software engineer + better at thinking about how to plan and structure medium to big projects)
Contribute to open source projects (including fastAI)

while I improve my current DL knowledge.

My initial plan was to try to start writing my own basic custom library for some DL stuff, compare it with fastAI’s and improve my own and get better at python as well. My library would essentially have been a duplicate of fastAI’s at the end (just for learning purposes and nothing more) but I thought this approach would help me get better.

I tried to look at some of the code (for example, https://github.com/fastai/fastai/blob/master/fastai/basic_train.py) and there’s a lot I am not familiar with in terms of -

General python knowledge (like I didn’t know you could do something like model:nn.Module or -> in python)
Structuring the code for larger projects (how do you think and plan ahead, how do you make decisions about which direction your codebase should go etc)

So, replicating/duplicating this code is quite daunting and I am not sure how to proceed. I know some of it would boil down to practice in general or read more code. But that doesn’t help me understand how I could go from working on smaller projects to bigger ones at all (I am not currently a software engineer, through work or through education). It’s equally fascinating and equally daunting/demoralizing to me.

Any suggestions on what I could do here?

Thanks for any help! (Please let me know if this isn’t the best thread for this discussion)

PegasusWithoutWinds · March 7, 2019, 5:35am

Check out this post.

the_doctor · March 7, 2019, 6:18am

Sorry, but I didn’t find that helpful. Every time I come across such posts for an open source project, it assumes a level of knowledge around software development or programming that is not something I have managed to learn from anywhere while being self-taught.

So I get stuck in the cycle of -

wanting to contribute -> checking out where -> feel intimidated by how little I know -> leave that project and try to work on something that covers that skill gap -> not being able to find how to cover that skill gap -> asking about it -> being told to contribute to open source projects

It’s become a bad habit for me to stay in that cycle.

Checking that contribution list, and most of them are claimed already and I can’t gauge how and where I could start that fits with my current level with python (or programming in general) but also helps me go deeper and learn more/get better.

So, I thought I could try to first understand parts of the codebase and understand how things are structured and try to replicate the library in my own custom version and learn along the way. Contributing directly to the library doesn’t seem feasible to me. Maybe it’s just a confidence issue with me now because I have been going through stuff like this for far too long and I just don’t know what to do.

Thanks once again. I will see if I can figure something out from that post.

PegasusWithoutWinds · March 7, 2019, 7:55am

I hear you, man. Looking at the codebase can be very intimidating. It still is to me today.

However, you don’t have to understand all the nitty-gritty details of the codebase in order to contribute. I would recommend starting with documentation contribution. It is a lot about playing around with the undocumented parts of the library, see what it does, and add some docs to it. You might need to read a small part of the codebase related to the function you need to write about, but it is usually quite manageable. Similarly, you don’t have to know all the flashy Python 3 features. Basic features will do.

Since I really don’t know exactly where your knowledge of Python stands, I really cannot give more specific advices. I would recommend you to get started with documentation improvement. Once you bump into some specific questions, you can ask them in the forum and I am sure people will be able to show ways to proceed for you.

iNLyze · March 7, 2019, 10:38am

How about picking one topic - a feature you are missing, a cool new development from a paper you’d like to implement, a piece of code from another project you are porting to fastai - and contributing that to fastai? You could first implement it locally and if you have trouble with PRs - ask in the forum. I am sure one of the hot shots around fastai will be more than happy to help you to integrate your parts. So, one by one you could learn about the inner structure of fastai rather than replicating all at once. I think that is asking too much from yourself and might leave you frustrated. And, believe it or not, everybody is in that same zone (or has been at some time).

jbuzza · March 7, 2019, 11:29am

I think this is a very interesting question and find myself in a similar position. I spend a lot of time googling to find good Python solutions or to understand code. I no doubt learn a lot doing this but it seems a slow process. While there are plenty of online Python courses I suspect they don’t address the type of questions posed here (or start too much from the basics).

I think the suggestions to start with something small or documentation are good and I have done this myself on a project I am working on but I feel this doesn’t necessarily help in building a broader knowledge of good practices.

Feras · March 8, 2019, 8:12am

Actually, I find my self at exactly the same position of yours. It’s really a tricky question and I don’t think there is a straight forward answer.

I started by looking at the codebase, but it didn’t help that much. Then, I thought I’ll contribute to the documentation, which I hope will do soon.

Recently I came across this Project.
It seemed exactly what I was looking for. It is a small size project that aims at replicating some functionality of Pandas. Although the basic structure and documentation is already available, but this is okay as a beginning.

I’m still at the beginning, but came across many useful things like: creating environments, documentation, the basic structure of the project, testing and some other cool stuff.

I hope you find this useful.

Also keep posting to this topic in case you managed to find other interesting ideas.

peterwalkley · March 8, 2019, 11:22am

Software Engineer with over 30 years experience here.

There isn’t a one size fits all answer for this as everyone learns differently, but you will learn most by trying to get something to work, failing, figuring out why you failed and then trying again until it works.

Training courses, lessons, Computer Science degrees, books, blogs etc can only give you pointers on where to start and help when you are stuck. Only by doing it will you absorb the knowledge. It takes time and you can’t learn it all at once. Be prepared to be frustrated and annoyed you can’t do it - but persist and look for help and you will get there.

Don’t expect to learn it all, ever. I am still learning and still will be when I shuffle off to the great commit in the sky.

In my day job, coding is the easy part. Figuring out exactly what the customer needs is the challenge.

PegasusWithoutWinds · March 8, 2019, 12:31pm

That repo looks wonderful! I

I think we should do a fastai_cub!

heye0507 · March 8, 2019, 4:39pm

Well I am same as you. Expect I have MSEE degree. But hey, back to school, they don’t teach you Python.

I started my DL on Dec 2018, right before Xmas, job changed so I got some extra time. I thought don’t waste time, so I started to pick something to learn.

So long story short, I was in lesson 3, and I want to help. So I looked up fastai library… man, I have to tell you, even you have some coding experience, it is hard to understand the new python style. Tell you a story, I searched whole fastai library looking for a method called cls(…), and no… I thought it is something related to label_cls, but it is not… it is called class method in Python.

But I didn’t give up, One day playing with data block api(lesson 3), the from csv is not working as expected. So I wrote small test function, it was not working again. I looked into library, there I made my first PR.

Just keep going, when running the notebooks, don’t just run the lines. Try to go differently, eventually you will make your first PR, and ask on forum, people here are very friendly and helped a lot.

And you can try join Kaggle, it feels good when people give you upvotes and got your first medal. When you know you only entered data science world for only couple months, you can still make contribution

monocongo · March 8, 2019, 5:35pm

This is spot on from my experience (~25 years as a developer). You will spin your wheels forever going from course to course and it will help but like most things you get better at things by doing them. I advise to jump into a project on GitHub that needs some help (shameless plug: try this one) and do your best trying to resolve an issue or two, things will start to make more sense and your skills will improve over time. There’s no short cut, and there is always more to learn, but that’s the nature of this game and don’t let it get you down.

I’ve been doing this for a long time and I feel like an absolute beginner when it comes to most things outside of my experience, deep learning and ML is just the latest example. Don’t let yourself become too discouraged, very few people pick up this stuff without lots of trial and error. You can do it!

Benudek · March 9, 2019, 10:58am

write tests: Improving/Expanding Functional Tests

heye0507 · March 9, 2019, 6:40pm

Thanks, I will definitely check it out.