New forum category especially for beginners

jeremy · November 8, 2017, 1:56pm

@part1_v2 : I’ve noticed that our forum has gotten rather full of more advanced topics, since there are quite a few people in the course who have already taken last year’s version, or who have completed other courses.

I’m concerned that this could be somewhat intimidating and confusing for folks that don’t have any background in machine learning or deep learning - which is probably over half the participants in this course. When reading answers to a question, it can be hard to separate which answers are advanced additional topics, vs which are directly related to the course content we’ve covered so far.

Therefore, I’ve created a new forum category that is specially for discussions targeted to people with little or no background in deep learning. The category is here: http://forums.fast.ai/c/part1v2-beg . To clarify:

All types of questions and answers are still welcome here in the original #part1-v2 category; just be aware that answers to your questions may include more advanced topics. So use this category if you’re happy to see these topics too
However only beginner friendly questions and answers should be posted in the new #part1v2-beg category. More advanced students are welcome (and encouraged!) to join these discussions, but please try to ensure your explanations, questions, etc are accessible and appropriate for people totally new to this field.

I’ve copied the ‘welcome post’ from the new category below, which has some more information. Feel free to reply to this post if you have any questions or comments about this new category. Consider this an experiment - we’ve not created a “beginner friendly” forum for previous courses. BTW I did consider instead creating an “advanced” category instead, but then I felt that the existing category we have here is already too full of advanced topics for that to work well.

Category welcome post:

This category is strictly for discussing topics that we have brought up in class, questions about running the code that we’ve looked at in class, or any other issues related to the lessons we’ve covered already. The goal is for this category to be approachable and non-intimidating to folks with the minimum prerequisites (no machine learning background, one year of coding experience).

In particular, posts should not cover future lessons, topics that aren’t in this course, setting up or using platforms other than AWS, Crestle, or Paperspace, or other topics outside of what we’ve covered in the lessons.

If you have a suggestion for other blog posts, books, etc which specifically cover a topic we’ve looked at in class in a way that is friendly to beginners, feel free to mention it when appropriate!

I have created a thread for each lesson: if you have any questions or comments about that lesson, please use that thread. Feel free to create new threads if your question or comment is more general, or covers some foundational issue like logarithms, ssh, numpy, etc.

abdulhannanali · November 8, 2017, 2:01pm

Thanks a lot for creating this. As a beginnner with no experience at all with Machine learning, this will surely help me and many other beginners in being more comfortable asking beginner targeted questions, the posts in the other category although intimidating at times, contain a lot of information regarding different aspects of neural networks and training them, I have learned a lot only from reading the threads.

jeremy · November 8, 2017, 2:02pm

Thanks for the feedback @abdulhannanali - that’s great!

naveenmanwani · November 8, 2017, 2:14pm

thank you @jeremy ,you solved the biggest problem.

angryziber · November 8, 2017, 2:50pm

ThankYou! @jeremy

ravimahar · November 8, 2017, 3:29pm

Finally something what I would understand Hope this thread will ease lot of things. In all other threads looks like people building some rockets … and I was feeling left behind.

jeremy · November 8, 2017, 4:04pm

I hope it helps too! One way to make sure it meets your needs is to ensure you ask and answer questions in that category, so that way you know it’s exactly what you require!

smortezavi · November 12, 2017, 12:15am

Hi everyone for those who need a refresher on the python classes and tools we will be using here is a great resource that helped me.

ravimahar · November 20, 2017, 5:03am

Q1. How can we modify the layers in the model? Is that the parameter num_workers which is default to 8?

layers

Q2. data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(resnet34, sz))
How does above code loads the data for both dogs and cats into the data? I looked into the code does it by default considers all the folders under valid (dir) and makes that many number of classes as well the datasets?

From ImageClassifierData:
@classmethod
def from_paths(self, path, bs=64, tfms=(None,None), trn_name=‘train’, val_name=‘valid’, test_name=None, num_workers=8):
trn,val = [folder_source(path, o) for o in (trn_name, val_name)]

ravimahar · November 21, 2017, 1:06am

Ans1: I think I got it. The 8 layers are the one defined in the resnet34 arch. The same shows up here.

Still not sure on the Q2

Any help appreciated.

ravimahar · December 7, 2017, 3:47am

Hi,

In the dogs breed code below how is the file name (id) in csv is mapped to the image name in the folder i.e. train and test?

data = ImageClassifierData.from_csv(PATH, ‘train’, f’{PATH}labels.csv’, test_name=‘test’, num_workers=4,
** val_idxs= val_idx, suffix=’.jpg’, tfms=tfms, bs=bs)**

I did a dig in the py script looks the below code is getting this done but still not sure how file name in the csv is mapped to the image.

Signature: csv_source(folder, csv_file, skip_header=True, suffix=’’, continuous=False)
Source:
def csv_source(folder, csv_file, skip_header=True, suffix=’’, continuous=False):
fnames,csv_labels,all_labels,label2idx = parse_csv_labels(csv_file, skip_header)
full_names = [os.path.join(folder,fn+suffix) for fn in fnames]
if continuous:
label_arr = np.array([csv_labels[i] for i in fnames]).astype(np.float32)
else:
label_arr = nhot_labels(label2idx, csv_labels, fnames, len(all_labels))
is_single = np.all(label_arr.sum(axis=1)==1)
if is_single: label_arr = np.argmax(label_arr, axis=1)
return full_names, label_arr, all_labels
File: ~/courses/fastai2/courses/dl1/fastai/dataset.py
Type: function

Moody · December 10, 2017, 7:55am

Could someone please write a step-by-step procedure regarding how to download papers from arXiv API? I really want to run the notebook and get a better understanding of how it works. The relevant links are provided below. Thanks in advance.

https://arxiv.org/help/api/index

ecdrid · December 10, 2017, 11:03am

https://arxiv.org/help/bulk_data_s3

Download PDF examples:

import arxiv
# Query for a paper of interest, then download
paper = arxiv.query(id_list=["1707.08567"])[0]
arxiv.download(paper)
# You can skip the query step if you have the paper info!
paper2 = {"pdf_url": "http://arxiv.org/pdf/1707.08567v1",
          "title": "The Paper Title"}
arxiv.download(paper2)

Check The Readme…

Moody · January 20, 2018, 6:58am

Thanks a lot! It is much easier than I expected (without using the Twitter API here). Change subject and n to suit one’s needs.

import arxiv
subject = "computer science"
n = 10000
obj = arxiv.query(search_query = subject, max_results = n)

import pandas as pd
df = pd.DataFrame(obj)
df_s = df[['summary', 'title']]

df_s.to_csv(f'{PATH}arxiv_cs.csv')

ecdrid · January 20, 2018, 7:00am

Computer Science or Computer Vision?

Moody · January 20, 2018, 7:09am

Change subject = "cs.CV" for “Computer Vision and Pattern Recongition”. It is a subset of computer science.

manishsh · February 24, 2018, 12:31pm

Hi I have a newbie question - If I have to classify images into cats, dogs, lions, and foxes instead of just cats and dogs what do I do, create four different folders or do something else.

Moody · February 25, 2018, 8:17am

You can use the same lesson 1 codes for four different folders. Try it and let us know how you go.

sumo · March 7, 2018, 9:58pm

Hi @Moody, I had a quick question on the approach outlined in your response to download the Arxiv dataset. Do you still get the “CAT” column in your dataset as Jeremy’s notebook did? I dont get it, which I guess is fine, but was wondering. Also, when I used subject = computer science and n=10000, my query returned only around 120-130 items. Was that true for you also?

manishsh · March 8, 2018, 6:45pm

Thanks Moody, I had another question - Like Microsoft created a tool for object detection and tagging for use with CNTK and Tensorflow, does a similar tool exist for integration with PyTorch?