Have a file at work with ~300 data points. I have about 4 features and want to predict a continuous variable.
For my workflow at the time I just used a multi variable linear regression model on all the data. However, seeing how easy to use a NN to this dataset I wanted to give it a go.
By definition I am overfitting the data as I am fitting a model to the ~300 points then seeing how well it predicts itself. I am not creating a prediction tool this was more of an understanding exercise of the data as well as learning about Tabular learners.
Documenting my learning’s here:
test = (TabularList.from_df(df, path=path, cont_names=cont_names))
data = (TabularList.from_df(df, path=path, cont_names=cont_names, procs=procs)
.split_none()
.label_from_df(cols=dep_var)
.databunch()
)
data.add_test(test)
nn_arch = [800, 600, 400, 200] # Length is number of layers and value is number of hidden units in each layer
learn = tabular_learner(data, layers=nn_arch, metrics=r2_score)
...
preds, y = learn.get_preds(DatasetType.Test)
targs = torch.tensor(df['dep_var'])
r2_score(preds, targs)
Started this off mainly because I hated how fbprophet didn’t force the output into the higher and lower bound that was set, so I created a class that does just that, and finding how easy it is to do that thanks to Jeremy’s lectures:
Hi stev3 Nice app!
I’m intrigued how did you train your model? Did you take a video and split the frames into various classes, such as position and score or miss?
Hi @hkristen
I’ve been recently playing with camera traps datasets, so maybe my notes will be helpful for you.
Many nature-related datasets are here: http://lila.science/datasets. Especially interested is Serengeti Dataset. It’s huge, but you can download single files as well:
Hi, this my first post! I am from Colombia and this is my first mini project from lesson 2.
Because we use jupyter notebooks on a daily basis, I decided to build a classifier in honor to this wonderful project and also to the father of modern science, Galileo Galilei, who discovered the four largest moons of Jupiter: Io, Europa, Ganymede, and Callisto.
My dataset was obtained from google image search and using the defaults parameters for the resnet34 model, I obtained a near to zero error rate.
Hi all!! I started the course a few weeks back and just after the second lesson, I made a classifier to distinguish between bulls and buffaloes. https://isitabull.onrender.com/
I created a proposal for the society of actuaries predictive analytics contest and won second place. Any suggestions on how to improve the model are welcome.
I am a relative newbie to this forum. I am using Google colab to run the fast ai notebooks. one of projects that i am running is to identify aerial images of different infrastructures (like schools, airports, etc.) I have downloaded images from google but the fastai model is pretty noisy with a very high error rate around percent and looking at the learning rate graph. Nothing much is help but lower learning rates are leading to spike in the error.
even after optimizing the learning rates, I am not able to reduce the error rate to less that 25%.
Is this a hard problem for CNNs to figure out or is there a data error. I am unable to figure out. Any help is welcome.
Hello Abhik:
Thank you for your message. Yes, I am an associate of the society of actuaries (ASA). Nice to see more actuaries here as well. For the competitions organized by the SOA, there is no specific place that I know of where the competitions get announced, it is usually by email. The active involvement programs that I know of right now are the Kaggle/SOA involvement program https://www.soa.org/programs/predictive-analytics/kaggle-program/ and the call for essays in actuarial practice and innovation https://www.soa.org/research/opportunities/actuarial-practice-innovation/
Best,
Maria
@Jabberwocky From my experience with a classifier that I tried to get working to identify four classes of Boeing’s commercial airplanes, I think the issue was the quality and quantity of data. The difference among them are very subtle and also you have to deal with airplanes with different colors. I never could obtain an error rate below 0.46
I’m sure you could do this in google colab! They make it a bit difficult working with cloned repositories… I think what you need to do is download the repo/zip file into a google drive folder, and then you can right click a notebook to open it in colab.
Colab for this kind of thing is a bit messy though. What I like to do is development on my local machine, especially this project because it involves a lot a image/file management, then only run it on a cloud GPU only once everything is in place and running the way I want it to. It’s just the training that needs a GPU. I’d say I spent 90% of my time building the inference loop and OpenCV/matplotlib image formatting, so that was fine running on my local machine’s CPU.
I think the architecture is cool because it isolates the target and that makes it a much better image-classification problem to solve. Whether running two networks at once per image is the optimal way to do it…? I’m not so sure!
Hi! I built a quick video-to-frame labeling tool for ML models. - I am using it in a custom classifier with transfer learning and collected a bunch of long videos for my application that I needed to label fast. Here the code if its useful to you: https://github.com/Mascobot/video_frame_classifier
Just built a custom classifier for microbe classification. It can recognize bacilli (rod-shaped) and cocci (circle-shaped) bacteria at error rate of 4%, despite different zoom levels, brightness, and contrast.
To say I’m excited at what fastai can do is a massive understatement.
However unlike amazon multi-class classification, here only 1 in each class is possible. so is it possible to get the result efficiently in this architecture?
As I am getting 3 labels(boy girl sad) at a time with threshold 0.2 which is not possible in my case.
Please review and suggest if you have any idea to how to make it better in any way.
I know this isn’t a great error rate, but for Lesson 1, not knowing what I’m doing and using a fairly small data set (~200) images. I feel pretty good about this model!
I was surprised how well back/front yard was predicted. The one interesting one was the back yard with the couch being categorized as a living room. In retrospect I should’ve added a patio category to account for something like that.