New forum category especially for beginners

Hi everyone for those who need a refresher on the python classes and tools we will be using here is a great resource that helped me.

4 Likes

Q1. How can we modify the layers in the model? Is that the parameter num_workers which is default to 8?

layers

Q2. data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(resnet34, sz))
How does above code loads the data for both dogs and cats into the data? I looked into the code does it by default considers all the folders under valid (dir) and makes that many number of classes as well the datasets?

From ImageClassifierData:
@classmethod
def from_paths(self, path, bs=64, tfms=(None,None), trn_name=‘train’, val_name=‘valid’, test_name=None, num_workers=8):
trn,val = [folder_source(path, o) for o in (trn_name, val_name)]

Ans1: I think I got it. The 8 layers are the one defined in the resnet34 arch. The same shows up here.

Still not sure on the Q2

Any help appreciated.

Hi,

In the dogs breed code below how is the file name (id) in csv is mapped to the image name in the folder i.e. train and test?

data = ImageClassifierData.from_csv(PATH, ‘train’, f’{PATH}labels.csv’, test_name=‘test’, num_workers=4,
** val_idxs= val_idx, suffix=’.jpg’, tfms=tfms, bs=bs)**

I did a dig in the py script looks the below code is getting this done but still not sure how file name in the csv is mapped to the image.

Signature: csv_source(folder, csv_file, skip_header=True, suffix=’’, continuous=False)
Source:
def csv_source(folder, csv_file, skip_header=True, suffix=’’, continuous=False):
fnames,csv_labels,all_labels,label2idx = parse_csv_labels(csv_file, skip_header)
full_names = [os.path.join(folder,fn+suffix) for fn in fnames]
if continuous:
label_arr = np.array([csv_labels[i] for i in fnames]).astype(np.float32)
else:
label_arr = nhot_labels(label2idx, csv_labels, fnames, len(all_labels))
is_single = np.all(label_arr.sum(axis=1)==1)
if is_single: label_arr = np.argmax(label_arr, axis=1)
return full_names, label_arr, all_labels
File: ~/courses/fastai2/courses/dl1/fastai/dataset.py
Type: function

Could someone please write a step-by-step procedure regarding how to download papers from arXiv API? I really want to run the notebook and get a better understanding of how it works. The relevant links are provided below. Thanks in advance.

https://arxiv.org/help/api/index


https://arxiv.org/help/bulk_data_s3

Download PDF examples:

import arxiv
# Query for a paper of interest, then download
paper = arxiv.query(id_list=["1707.08567"])[0]
arxiv.download(paper)
# You can skip the query step if you have the paper info!
paper2 = {"pdf_url": "http://arxiv.org/pdf/1707.08567v1",
          "title": "The Paper Title"}
arxiv.download(paper2)

Check The Readme…

2 Likes

Thanks a lot! It is much easier than I expected (without using the Twitter API here). Change subject and n to suit one’s needs.

import arxiv
subject = "computer science"
n = 10000
obj = arxiv.query(search_query = subject, max_results = n)

import pandas as pd
df = pd.DataFrame(obj)
df_s = df[['summary', 'title']]

df_s.to_csv(f'{PATH}arxiv_cs.csv')
2 Likes

Computer Science or Computer Vision?

Change subject = "cs.CV" for “Computer Vision and Pattern Recongition”. It is a subset of computer science.

image

Hi I have a newbie question - If I have to classify images into cats, dogs, lions, and foxes instead of just cats and dogs what do I do, create four different folders or do something else.

You can use the same lesson 1 codes for four different folders. Try it and let us know how you go.

2 Likes

Hi @Moody, I had a quick question on the approach outlined in your response to download the Arxiv dataset. Do you still get the “CAT” column in your dataset as Jeremy’s notebook did? I dont get it, which I guess is fine, but was wondering. Also, when I used subject = computer science and n=10000, my query returned only around 120-130 items. Was that true for you also?

Thanks Moody, I had another question - Like Microsoft created a tool for object detection and tagging for use with CNTK and Tensorflow, does a similar tool exist for integration with PyTorch?

I assumed you are referring to category (“CAT”) column in Jeremy notebook

The similar information is shown under “arxiv_primary_category” column

n=10000 is a big enough number so I don’t need to change it all the time. But, I definitely have more than 120-130 items in computer science. You can check the details easily by using Pandas. :panda_face:

It seems this is not a beginners’ question. :thinking: Are you referring to this?

@jeremy Can you please suggest the best way to go through the course?

I have a couple of questions in mind:

  1. Do I try to fully understand a concept and write the simple algorithms from scratch as and when it is discovered in the course or should I go about just running the code cell and wait till later when it is done in more depth?
  2. Is every concept like cosine annealing, stochastic gradient descent with restarts, data augmentation revisited or these should be done by ourselves?

@ other learners or people who have completed the course, please do share your approach too.

I think best way is to watch each video weekly,Then dig out all the python codes @jeremy has provided try to see what each line of code is doing.Make some changes get more accurate results and complete the course.

Thanks a lot for your feedback :slight_smile:

Is this course http://bit.ly/2L2GSR1 good for beginners?

On using load_learner, I got the following error, any ideas on how to resolve the issue.

in
----> 1 learn_ett = load_learner(Path(’/baseline-b4’))

/opt/conda/lib/python3.7/site-packages/fastai/learner.py in load_learner(fname, cpu)
551 "Load a Learner object in fname, optionally putting it on the cpu"
552 distrib_barrier()
–> 553 res = torch.load(fname, map_location=‘cpu’ if cpu else None)
554 if hasattr(res, ‘to_fp32’): res = res.to_fp32()
555 if cpu: res.dls.cpu()

/opt/conda/lib/python3.7/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
592 opened_file.seek(orig_position)
593 return torch.jit.load(opened_file)
–> 594 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
595 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
596

/opt/conda/lib/python3.7/site-packages/torch/serialization.py in _load(zip_file, map_location, pickle_module, pickle_file, **pickle_load_args)
851 unpickler = pickle_module.Unpickler(data_file, **pickle_load_args)
852 unpickler.persistent_load = persistent_load
–> 853 result = unpickler.load()
854
855 torch._utils._validate_loaded_sparse_tensors()

AttributeError: Can’t get attribute ‘NonNativeMixedPrecision’ on <module ‘fastai.callback.fp16’ from ‘/opt/conda/lib/python3.7/site-packages/fastai/callback/fp16.py’>