General course chat

I saw these token in fastai.text tokenization. What do they mean?
‘xxunk’,
‘xxpad’,
‘xxbos’,
‘xxeos’,
‘xxfld’,
‘xxmaj’,
‘xxup’,
‘xxrep’,
‘xxwrep’

how to create a datablock when i have a csv that consists of image paths and another csv that conists of labels

Does anyone know how to post a question about a technical issue I am having with fastai?

Don’t think the lectures will be made available until after the lectures have finished IRL. There is probably some editing and sound post processing that is also done prior to releasing the videos I think.

I’m assuming you can ask in the relevant lecture thread (Lesson 1 etc…) or ask in the main forum for the specific version of the course/lesson your question is about.

Here is a link to Jeremy’s post regarding asking questions:

I think this forum should have Lesson threads broken down by general lesson discussion, lesson X related questions along with the main thread. I see some posts in the v3 course forum that are called “advanced” discussion which would naturally deter newbies from asking a question there. I know I would be a little intimidated barging into a forum with “advanced topics” in its title and asking a question I know is a beginner question :slight_smile:

The advanced subforum is shown by default, but you can remove those threads from your view by clicking “none” here:

This general forum category that we’re posting in here is suitable for any non-advanced questions related to anything shown in the lessons - including questions about python functions and libraries used, math concepts mentioned, etc.

1 Like

how to use datablock api when there are 2 folders with images and 2 csv files with labels?

Copy before everything to one folder. Unless there is some I found that it is always good idea to create a function that will create new csv that fast.ai will understand. My question is is it possible to pass the function for the labels instead of csv, so that this process can be more fast?

First my apologies; I could’nt figure out how to create a new thread so posting here.

I’m working on a GAN for tabular data, and was wondering two things:

  1. Is there any work being done on tabular GANs in Fastai? I’m working on having the abilityto generate continuous, discrete and date variables. I haven’t found any work on this so far in fastai.
  2. Is there any work done on a add_cyclic_datepart inverse, which can rereate the date using the cyclical features and a year.

Hi crew! This is my first ever forum post, what a time!

I’m having trouble importing a dataset hosted on Mendeley, here - the XRay one. It’s a normal .zip not a tar.gz file - I’m running

path = untar_data("https://data.mendeley.com/datasets/rscbjbr9sj/2/files/41d542e7-7f91-47f6-9ff2-dd8e5a5a7861/ChestXRay2017.zip?dl=1"); path

and getting the error

OSError: Not a gzipped file (b'PK')

I’m running my notebook on a Google Cloud Compute instance, type n1-highmem-8.

If anyone could give me some pointers to get around this that would be most appreciated!

Solution found. Download in Jupyter notebook with

!wget url/datasetname.zip

then unzip with

!unzip datasetname.zip

I am trying to run the course on a Nvidia Tesla P4 GPU which has 8 GB of GPU memory. I see that CUDA keeps running out of memory while the running the course code as is. I often need to decrease the batch size because of this, and the most tedious part is that I lose 2-3 hours of code run time because of this and have to often re-run the entire thing.
I was wondering if it would be possible to use my CPU memory for the same(I have taken up a VM with 52GB of additional Memory). Is there something prebuilt into fastai for this? I came across a blog of NVidia for this.


Would this solve my problem?

I have done everything recommended here, but it still doesnt work on datalab. Anything else possible to try?

Fast.ai is really growing fatst :wink:

1 Like

Hi All,
Has anyone ever pretrained embeddings based on hierarchical features? I would like to do this for items prior to using them in a collaborative learning type model but I’m having a tricky time doing so and would like some advice.

My idea stems from the added value of using pretrained word embeddings for the language model jeremy uses in the IMDB notebook. Instead of starting with randomly initialized item embeddings starting with items that have already been pretrained to contain their innate categorization, I imagine it would be able to train much better. (I would love to be able to freeze these and only train user embeddings for the first bit of a collab model too.)

The data I have has items and hierarchical features such as category, sub-category, manufacturer, and brand. My first idea that doesn’t seem to be working very well was to create an embeddingnn with BCE and training each item embedding and hierarchical feature embedding (including negatively sampled categories with 0s instead of 1s). I put all hierarchical categories into the same embedding space. The trouble is I can only achieve a loss of ~.38 and looking at item similarity based on cosine distance between item embeddings shows me that the embeddings haven’t really learned much. I.E. I expect items in the red wine category to be most similar to other categorized red wine items or at least other items that fall into a higher level category such as alcoholic beverages, but this isn’t the case.

Would training a multiclass classifier to predict the 4 hierarchical features per item work? Or should I use the categories as an input and train to predict 1s or 0s? Any ideas? Thanks

All,

Is there any way to download the lesson videos? I have a lot of free time where I don’t have access to wifi, and being able to download the videos would be great.

Thanks!

hi all, i tried to search in the documentation but i don’t found if there is the possibility to do trainging directly over .csv file or we must convert that in immages.

Could you help me?
thanks for your time

Hi Jeremy. I am registered to the Part 2 course and get offloaded a lot with access issues to my Part 2 course. I have interacted with the Data Institute for few weeks trying to get proper access. Please support access for my user ID.

Suggest Direction

After years of dabbling with ML, I finally feel I’m making progress thanks to Fast.AI. I have my image classifier working and hosted!

One problem I’m trying to solve is locating the edges of these boards. I’ve tried OpenCV with limited success and thought ML might help.

What do you think?

Thanks,
Russ

I just learned working with tabular data through lesson 4. I tried using it on a Kaggle data set but I am getting large loss values. I have tried various things to resolve this but failed to determine the issue. Can anyone please help me out with this. Here is the link to the kernel.
Thank you :slight_smile: