Kaggle Questions

darthdeus · April 30, 2017, 2:12am

This is probably a silly question, but what does the Kaggle submission score mean? Is it accuracy percentage on the test set? Or is it a percentile?

I’ve tried two submissions so far, one tested on a very small training set (100 images) just to see if it works, and another tested on 2000 images, and they got 31.18 and 32.86 score respectively (but both scored around 99% accuracy on the validation set).

simoneva · April 30, 2017, 9:08am

Each kaggle competition has a page explaining how it is scored. A common one is logloss.

Floriano · July 14, 2017, 2:45pm

Good morning,
I am complete new to this lesson, did my setup of a p2 instance and made my way through lesson 1. It was great fun and i took a lot out of it so far. Thanks for the great course. As proposed I made an account at kaggle, choosed d&c redux, confirmed the rules and uploaded a few different submission files. (Just minor difference) The best score I got was 0.09185, somewhere in the 300dreds up
Looking under my submissions I find alll of the uploads, however I cannot find myself in the leaderboard. Am I doing something wrong here? A search inside kaggle did not enlight me so far.
Looking at the score my best submission was rated, it is far far away from the top 50. Up to now I run 4 epochs, which indeed gave some improvements, but nothing which would encourage me that after a few more epochs I would arrive within the top 50.
I use the correct label (the ones for dogs) I played with the logloss values, which shows some results but again nothing which let me think that it could change the game.
So I am a bit stuck. Either I got something very wrong, or the last 6 months changed the quality level of this competition in a way, that further, but at the moment unknown steps are necessary to make significant improvements
Thanks in advance for every hint

yubrshen · August 1, 2017, 3:09am

Yes, the pre-condition for ‘kg download’ to work is having accepted the competition rules by going to the competition page and clicking the button to join (“late participation”).

It would be nice to add the reminder to the wiki. It seems as a newcomer, I may not have the privilege to sign-in and provide the update.

surmenok · August 21, 2017, 12:32am

it is far far away from the top 50

There is no requirement to get into top 50. The task for lesson 1 was “try to get in the top 50%”, much easier task than getting into top 50.

or the last 6 months changed the quality level of this competition in a way, that further, but at the moment unknown steps are necessary to make significant improvements

Public leaderboard changed since the time first few lessons were recorded. In the lesson 2 video, you can see that there were 149 positions on the public leaderboard at that time. Now there are 1314 positions, and scores in the top of the leaderboard got better. To be in the top 50% you have to be better than position 657 that has score 0.12204. So it seems like you got into top 50% with 0.09185 score!

veerac · August 22, 2017, 3:56am

Is there way to know our percentile, or do we just calculate based on the score?

surmenok · August 23, 2017, 12:57am

I was calculating it based on the score. I think they don’t show new users on the leaderboard because the competition has completed.

MaxH · August 30, 2017, 8:03pm

I’ve had some interesting results from submitting my scores to the Kaggle competition.

On my first attempt, I realized that I had only used a sample set of data to train the model using only 1 epoch. (Am I using this term correctly? Or should it be: “…sample set of data with 1 training epoch.”)

My first score was 0.12370.

I decided to run the prediction again, however, this time I switched to the full training set, and ran 3 epochs.

When I submitted my results, the second score was 0.18837!

The accuracy of each model, according to the output in my Jupyter notebook, was ~87% and ~98% respectively.

Given that the reported accuracy is higher for the second model, why would the score from Kaggle be lower?

The only reasons I can think of are:

A peculiarity of the scoring system.
The first model got “lucky”?

Any ideas?

weisurya · September 5, 2017, 7:04pm

It’s quite opposite to mine.

My first approach is using 3 epochs w/ 0.01 learning rate. It gave me 0.11392.
After that, I changed my learning rate. Actually, the result in my VM became worse than the first one, but in Kaggle, they gave me 0.09591.
I tried to go back with default learning rate, but using more epochs. The result was better than the first one, which is 0.10277, but it’s worse than the second one.

So, I think changing learning rate may give a better impact in this case. I will take another attempt with smaller learning rate and more epochs later.
I assume it may give a better result (in Kaggle Evaluation Alr., at least.)

Abeginer · March 2, 2019, 1:03pm

I died ine lesson 1…when I try to run
path = untar_data(URLs.PETS); path

there is something wrong like this:

Blockquote
Downloading https://s3.amazonaws.com/fast-ai-imageclas/oxford-iiit-pet

--------------------------------------------------------------------------- gaierror Traceback (most recent call last) /opt/conda/lib/python3.6/site-packages/urllib3/connection.py in _new_conn(self) 140 conn = connection.create_connection( → 141 (self.host, self.port), self.timeout, **extra_kw) 142 /opt/conda/lib/python3.6/site-packages/urllib3/util/connection.py in create_connection(address, timeout, source_address, socket_options) 59 —> 60 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM): 61 af, socktype, proto, canonname, sa = res /opt/conda/lib/python3.6/socket.py in getaddrinfo(host, port, family, type, proto, flags) 744 addrlist = → 745 for res in _socket.getaddrinfo(host, port, family, type, proto, flags): 746 af, socktype, proto, canonname, sa = res gaierror: [Errno -3] Temporary failure in name resolution During handling of the above exception, another exception occurred: NewConnectionError Traceback (most recent call last) /opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 600 body=body, headers=headers, → 601 chunked=chunked) 602 /opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 345 try: → 346 self._validate_conn(conn) 347 except (SocketTimeout, BaseSSLError) as e: /opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in _validate_conn(self, conn) 849 if not getattr(conn, ‘sock’, None): # AppEngine might not have .sock → 850 conn.connect() 851 /opt/conda/lib/python3.6/site-packages/urllib3/connection.py in connect(self) 283 # Add certificate verification → 284 conn = self._new_conn() 285 /opt/conda/lib/python3.6/site-packages/urllib3/connection.py in _new_conn(self) 149 raise NewConnectionError( → 150 self, “Failed to establish a new connection: %s” % e) 151 NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7ff1132ee828>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution During handling of the above exception, another exception occurred: MaxRetryError Traceback (most recent call last) /opt/conda/lib/python3.6/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 448 retries=self.max_retries, → 449 timeout=timeout 450 ) /opt/conda/lib/python3.6/site-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 638 retries = retries.increment(method, url, error=e, _pool=self, → 639 _stacktrace=sys.exc_info()[2]) 640 retries.sleep() /opt/conda/lib/python3.6/site-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace) 387 if new_retry.is_exhausted(): → 388 raise MaxRetryError(_pool, url, error or ResponseError(cause)) 389 MaxRetryError: HTTPSConnectionPool(host=‘s3.amazonaws.com’, port=443): Max retries exceeded with url: /fast-ai-imageclas/oxford-iiit-pet.tgz (Caused by NewConnectionError(‘<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff1132ee828>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,)) During handling of the above exception, another exception occurred: ConnectionError Traceback (most recent call last) in () ----> 1 path = untar_data(URLs.PETS);path /opt/conda/lib/python3.6/site-packages/fastai/datasets.py in untar_data(url, fname, dest, data, force_download) 160 shutil.rmtree(dest) 161 if not dest.exists(): → 162 fname = download_data(url, fname=fname, data=data) 163 data_dir = Config().data_path() 164 assert _check_file(fname) == _checks[url], f"Downloaded file {fname} does not match checksum expected! Remove that file from {data_dir} and try your code again." /opt/conda/lib/python3.6/site-packages/fastai/datasets.py in download_data(url, fname, data) 142 if not fname.exists(): 143 print(f’Downloading {url}‘) → 144 download_url(f’{url}.tgz’, fname) 145 return fname 146 /opt/conda/lib/python3.6/site-packages/fastai/core.py in download_url(url, dest, overwrite, pbar, show_progress, chunk_size, timeout, retries) 165 s = requests.Session() 166 s.mount(‘http://’,requests.adapters.HTTPAdapter(max_retries=retries)) → 167 u = s.get(url, stream=True, timeout=timeout) 168 try: file_size = int(u.headers[“Content-Length”]) 169 except: show_progress = False /opt/conda/lib/python3.6/site-packages/requests/sessions.py in get(self, url, **kwargs) 544 545 kwargs.setdefault(‘allow_redirects’, True) → 546 return self.request(‘GET’, url, **kwargs) 547 548 def options(self, url, **kwargs): /opt/conda/lib/python3.6/site-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 531 } 532 send_kwargs.update(settings) → 533 resp = self.send(prep, **send_kwargs) 534 535 return resp /opt/conda/lib/python3.6/site-packages/requests/sessions.py in send(self, request, **kwargs) 644 645 # Send the request → 646 r = adapter.send(request, **kwargs) 647 648 # Total elapsed time of the request (approximately) /opt/conda/lib/python3.6/site-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 514 raise SSLError(e, request=request) 515 → 516 raise ConnectionError(e, request=request) 517 518 except ClosedPoolError as e: ConnectionError: HTTPSConnectionPool(host=‘s3.amazonaws.com’, port=443): Max retries exceeded with url: /fast-ai-imageclas/oxford-iiit-pet.tgz (Caused by NewConnectionError(‘<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff1132ee828>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution’,))

duckpool · March 5, 2019, 10:52am

I’m using kaggle’s notebook for practicing the codes. When i’m using create_cnn, i get this error when the model is downloaded:
OSError: [Errno 30] Read-only file system: ‘…/input/the-oxfordiiit-pet-dataset/images/images/models’

how can i change the directory to ‘…kaggle/working’ in the function? or is there a workaround in kaggle?

thanks for the help!

LisaN · March 10, 2019, 12:41pm

Hey,

In case you have a problem in Kaggle - maybe you have not switched on the Internet of your Kernel yet?

I just had the same problem…