Beginner: Beginner questions that don't fit elsewhere ✅

does anyone know how to store my Hugging Face username and password on my computer or in VS Code, so I don’t have to enter them every time I push to Git / Hugging Face from VS Code? currently when I push I’m prompted to enter my username and password for Hugging Face.

when I push I get this:

/mnt/c/Program\ Files/Git/mingw64/libexec/git-core/git-credential-manager-core.exe get: 1: /mnt/c/Program Files/Git/mingw64/libexec/git-core/git-credential-manager-core.exe: not found

this looks like there’s an error to do with Git Credential Manager. Git Credential Manager is included with Git for Windows. I think I have Git for Windows installed.

Im actually confused what to use at this point for blogging cuz the fastpages project has been deprecated (it was on github) or quarto which seems to have a steep learning curve.

Hi all, I am going through the material of chapter 1 clean version on Papers space and just digging into the fastai library using the doc method, but when I try to click on show in docs link I get a "Not Found
The requested URL was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request."

Btw the link to the source code of things works.

Not sure if anyone is experiencing this as well?

Hey :slight_smile:,
When I tried this my nbdev library on paperspace was completely outdated (still v1) which created the wrong links.
Updating nbdev fixed the issue for me, keeping fastai up to date would also be a good idea, so run:

mamba install -c fastchan fastai nbdev

from a terminal. Let us know if this does not solve it for you :slight_smile:

Thanks @benkarr that solves my problem, do you know if this will be a persistent change, or will I need to do the add it to the pre-run-sh to make it persistent?

I just restarted the instance I made the changes to and they are gone… I don’t use paperspace much myself, so cannot tell you where to add it exactly :frowning:. Have you watched the Live Coding sessions? I think Jeremy talks extensively about how to set up paperspace and how to make persistant changes in the first couple sessions.

Yes I remember the trick, I will just proceed with that.
Thanks for the help

1 Like

Hey everyone,
I was wondering how we can save the databunch object.
Because every time I run my notebook the databunch method creates new random batches of images, and I want to control the data batches for my experiments.
I tried using the databunch.save(“PATH/TO/.pkl”) but it gives me a ctype objects can’t be pickled error.
Here is the code I tried for saving the databunch:

train, valid = ObjectItemListSlide(train_images) ,ObjectItemListSlide(valid_images)
item_list = ItemLists(".", train, valid)
lls = item_list.label_from_func(lambda x: x.y, label_cls=SlideObjectCategoryList)
lls = lls.transform(tfms, tfm_y=True, size=patch_size)
data = lls.databunch(bs=batch_size, collate_fn=bb_pad_collate,num_workers=0).normalize()
data.save("test_data_bs8.pkl")

The error:

ValueError                                Traceback (most recent call last)
<ipython-input-22-88cd1dd3e50b> in <module>
----> 1 data.save("test_data_bs8.pkl")

3 frames
/usr/local/lib/python3.7/dist-packages/fastai/basic_data.py in save(self, file)
    153             warn("Serializing the `DataBunch` only works when you created it using the data block API.")
    154             return
--> 155         try_save(self.label_list, self.path, file)
    156 
    157     def add_test(self, items:Iterator, label:Any=None, tfms=None, tfm_y=None)->None:

/usr/local/lib/python3.7/dist-packages/fastai/torch_core.py in try_save(state, path, file)
    414             #To avoid the warning that come from PyTorch about model not being checked
    415             warnings.simplefilter("ignore")
--> 416             torch.save(state, target)
    417     except OSError as e:
    418         raise Exception(f"{e}\n Can't write {path/file}. Pass an absolute writable pathlib obj `fname`.")

/usr/local/lib/python3.7/dist-packages/torch/serialization.py in save(obj, f, pickle_module, pickle_protocol, _use_new_zipfile_serialization)
    378         if _use_new_zipfile_serialization:
    379             with _open_zipfile_writer(opened_file) as opened_zipfile:
--> 380                 _save(obj, opened_zipfile, pickle_module, pickle_protocol)
    381                 return
    382         _legacy_save(obj, opened_file, pickle_module, pickle_protocol)

/usr/local/lib/python3.7/dist-packages/torch/serialization.py in _save(obj, zip_file, pickle_module, pickle_protocol)
    587     pickler = pickle_module.Pickler(data_buf, protocol=pickle_protocol)
    588     pickler.persistent_id = persistent_id
--> 589     pickler.dump(obj)
    590     data_value = data_buf.getvalue()
    591     zip_file.write_record('data.pkl', data_value, len(data_value))

ValueError: ctypes objects containing pointers cannot be pickled

Can anyone tell me how I can save my data bunch object, or is there a method from which I can save the images created and encapsulated in databunch to a directory and how to load them later with their bounding box labels?

I don’t know anything about databunch, but considering your trying to use this to solve the above problem, perhaps there is another way. Have yo soncidered setting the seed.

We deprecated fastpages because we thought that Quarto is at least as easy, and it provides more useful features.

It sounds like either we (or the quarto team) has done a poor job of positioning it, if it comes across as having a steep learning curve – or that we’ve misunderstood how complex it is for new users.

It would be really super useful if you could say more about what has given the impression it has a steep learning curve, since that would help us find ways to both decrease that steepness, and also better explain it.

4 Likes

I’ve been thinking on this for a while, and @Voi_l8ight may say I’m off here, but here’s my thoughts:

The Quarto documentation is humongous. There’s so many features that I’m stuck in decision overload constantly trying to see just what I can do. On top of that, there’s not really a simple guide to Quarto in nbdev in that sense of just saying “here’s how you minimally do XYZ”, or at least not one I’ve seen regularly. Just pointing more to the docs that have far too much information for someone who’s new and isn’t trying to go into rabbit holes of how everything works

I think a major win here to help others is to take a page (literally) out of fastpages. One thing I loved with fastpages is I simply had to fork the repo, run one action, and then I could write a notebook separately and drag/drop into Github later whenever I wanted to post a new blog. If there’s a simple template repo that is setup for blogging where the user doesn’t need to type anything in and their blog just “works” that’d be wonderful.

Because while the solution can often be “Go check this Quarto resource”, I’d rather have the encyclopedia downsized to something small and efficient if possible and in the best case scenario have it not be needed :slight_smile:

E.g. I’m finding myself constantly going to @Ezno’s resource here that details very quickly what some of the major things you can do in quarto are. Something like this as just a one-page doc with the major bits folks would most likely often use would be phenomenal. (So nothing insane like graphviz or mermaid), and have it be prevalent high in the nbdev documentation as well so it can be found easy (as in TOC, beginner tutorials reference it, etc, something where someone just trying to find it doesn’t need to dig for ages)

Just my 0.02$, hope that helps some with my own thoughts :slight_smile: (And this comes from a place of love, as nbdev v2 is amazing)

cc @hamelsmu

4 Likes

Thanks for the reply Jeremy, by 'steep learning curve ’ I meant that there’s no specific or maybe a tutorial containing all the details, steps at one place which would make life easier for novice programmers like me.
Thank you

Yeah the documentation is humongous and overwhelming for novice guys like me (or maybe just me)
The fastpages tutorial was very clear and to the point.

1 Like

Many thanks to you both for this helpful feedback.

3 Likes

Noted. We will have to think about this. It might be that fastest way to create a blog might not even involve GitHub at all!

I also find the documentation to be very confusing and took me quite the while to get used to it, especially with the cell options scattered everywhere. I still find it somewhat hard to navigate.

For example I don’t believe they have a single cheat sheet with all the cell options, page options, and site options (some options apply at multiple levels as well)

We will have a talk about this internally and see what we can do and/or talk to the Quarto folks

Thanks for the feedback

3 Likes

Yeah, I have used setting the seed. But I haven’t tried the seed option, if there is any, for databunch object.

np.random.seed(42)
train_images = list(np.random.choice(training_set, train_samples_per_scanner))

So, I only used seed for a random selection of images, but how they get processed by the databunch is a black box for me. Thus, I wanted to save those data bunch objects, but it looks like saving them gives me that error.
Does anyone know how to save data for object detection problems in FASTAI.

1 Like

@muellerzr this is a really easy way to get started with Quarto blogs

Note: GitHub is not required, and you can publish directly to quarto-pub. You can also use a GUI if you want to use VSCode or RStudio for most of the setup!

I’ve added a link to that document in the blogging guide for nbdev. HTH

2 Likes

After finishing up the Notebook for lesson one, I wanted to try build a model that can predict is a picture is of Elon Musk or not, I want through the same steps of finding relevant data and loading it in a DataBlock and tuning it and while this works by telling me who is the image of, I am curious if that’s the best approach for this specific small problem as it seems like the model contains a lot of extra information about other humans too and also, how do I get it to output if it’s elon or not, instead of the labels it does right now ?

Here is my notebook: Is it an Elon Musk ? | Kaggle

1 Like

you could treat all your other classes (“bill gates”, “trevor noah” etc.) as the same class and then let the model learn to differentiate between elon and all of the others. I guess the easiest way to do this would be to copy all of the other images into one folder and call it “others” and then you can proceed as normal.

3 Likes

You could do what @rasmus1610 suggests but since you are using a labeling function already you could also adjust it to your needs. Right now you use parent_label to determine which label each instance gets, by returning the folder the image is in:

parent_label('path/to/folder/file.png')
'folder'

So you could check if that folder is ‘elon musk’ to generate your label from that e.g.:

def get_y(fn):
    folder = parent_label(fn)
    return 'elon' if folder == 'elon musk' else 'no-elon'

yields:

get_y('path/to/elon musk/file.png')
'elon'
get_y('path/to/else/file.png')
'no-elon'

Make shure to check the dataloaders (dls.show_batch()) to see if the labels fit your expectation.

2 Likes