Open Source Projects: how to contribute

Hi everybody!
I’m currently a Master’s student in Europe (computational biology) but I’ll probably try to switch a little and get into ML.
One thing that I surely need to improve/increase is my practical coding experience. I can write small, self-contained programs but I’m not used to medium size libraries, let alone big ones. I had a look online and there are a number of resources where I can find open source projects that look for people willing to contribute. For instance, SK-Learn has some entry-level tasks. Nonetheless, I find it difficult as the library is quite huge, so getting into it is overwhelming,
Does anybody have any suggestion? Like smaller libraries or online courses for software engineering? Any suggestion is very much welcomed! Thank you.



I think, learning about Object oriented programming in Python would be great start. Also, in parallel you can look at Caffe2 and see how it is implemented in oop manner. That would give some insights. Also writing a big library shouldn’t be the overall goal. You can start with a project, and when you are doing other project you can start reusing the code you have already written and it would slowly grow into a library of tools you actually need. Let me know if this helped you and if I can be of any more help! Thanks!

1 Like

I was in a same situation as you before. And library is the first big library that I feel comfortable digging into the source code. As following the course in part 2, you will get some tips from Jeremy on how to deal with the library. At this time, with the new library, there are plenty idea you can try. And you will certainly get support from people here I believe,

P/s: overwhelmed is normal, just relax. Don’t try to rush, it will make you tired. You will get familiar to it from time to time


I actually thought to get involved in the development of the library, but there is no list of “issues” or needed improvements as in other libraries on GitHub (you know, the Issues section). But thanks.

Actually they don’t use the issues label in github to develop at this stage. They use this forum. For more information you can find in the “ dev” category.

1 Like

Hey Lorenzo, I used to be in the same shoes as you do right now. I also thought about contributing to open source project with a similar motivation as yours. However, my effort has led to many failures.

Now looking back, I feel that trying to get involved in open source project too early can be just as harmful as diving deep into ML/DL theory early on while taking this class. Yes, there are countless things to do in both cases, the former with an endless thread of issues and the latter an infinitude of math. The issue here is that you might be lost in them without a sense of purpose and direction. They simply do not seem connected to what you are learning and doing here in this class.

After asking some veterans in open source project, I gradually uncovered a pattern. Most of them contribute to the open source because their daily work leads to it. They publish and fix issues because they ran into them in work; they create new and improve existing features because they need them; they create a brand new framework with better designed software architecture because they see how it will benefit their current project. The key here is that all of these activities come naturally when you are doing your own work. You don’t have to go all out of your way just for the sake of “contributing to open source”.

Therefore, my suggestion is to focus on the course and topics you are interested in. Work on lots of projects and challenge yourself with increasingly difficult and complex ones. Soon you will complain about the libraries you are using and make contribution to it.:joy:


Definitely listen to @dhoa, who is doing a great job of becoming a contributor. :slight_smile: Here’s a direct link to the category: #fastai-dev


Agree, scikit-learn is a quite huge thing with a lot of Cython dependencies and extensions. I’ve started contributing into smaller projects, like mesa or sklearn_pandas. Simple commits and just dozens of lines of code but better than nothing.

And, library is a great opportunity to look into very accessible machine learning source code. From my point of view, it is much more clear then Keras or TensorFlow. Probably because it is in the early stage, or uses a less sophisticated hierarchy of classes, without inventing too many layers of abstraction.


Yes, agree, daily working with some library makes you an expert in this library. And, you are spotting bugs, un-optimal solutions, etc. because the source code is directly available in such dynamic languages like Python.