Uploading own Dataset

Hello everyone,

Firstly a big thank you to Jeremy and Rachel for putting out this wonderful course for us!

I want to upload my own dataset which is not from the internet, but rather my own. I do not know how to upload it in Jupyter notebook to run it from the AWS Instance.

Here’s what I tried:

  1. Once we ssh to aws and then log into jupyter notebook, there is an option called upload on the right side of browser, but when I did that, it said file size exceeds 25MB.

  2. Then I tried dragging the folders from my computer to jupyter through the upload button again, and it wasn’t really doing anything.

Sorry to ask a silly question. But can someone please guide how I could upload data from my computer to the jupyter notebook through aws?

2 Likes

Hi there what you really want is scp which is cp command for ssh connections. I hope this link will help https://stackoverflow.com/questions/11822192/ssh-scp-local-file-to-remote-in-terminal-mac-os-x. It’s a secure way of file transfering.

But specifcally for AWS you also need your pem file:

  • Go to local dir that have the files you want to upload.
  • Then do something like this from the terminal: scp -i key.pem magento.tar.gz user@xx.x.x.xx:specify_path_here
  • magento.tar.gz is the file you like to transfer
  • user@xx.x.x.xx would be something like ec2-user@xx.x.x.xx or ubuntu@xx.x.x.xx depending on what type of machine you run
2 Likes

I’d use rsync instead of scp for a large dataset — my intuition is that rsync is better for lots of files, but I could be wrong.

I find the config of both pretty confusing — you can type man rsync, but those docs are pretty complex still — I prefer tldr

rsync

Transfer files either to or from a remote host (not between two remote hosts).
Can transfer single files, or multiple files matching a pattern.

  • Transfer file from local to remote host:
    rsync path/to/file remote_host_name:remote_host_location
  • Transfer file from remote host to local:
    rsync remote_host_name:remote_file_location local_file_location
  • Transfer file in archive (to preserve attributes) and compressed (zipped) mode:
    rsync -az path/to/file remote_host_name:remote_host_location
  • Transfer a directory and all its children from a remote to local:
    rsync -r remote_host_name:remote_folder_location local_folder_location
  • Transfer only updated files from remote host:
    rsync -ru remote_host_name:remote_folder_location local_folder_location
  • Transfer file over SSH and show progress:
    rsync -e ssh --progress remote_host_name:remote_file local_file

Hi Kerem, thanks for your answer.
I used the scp in the following manner:

I have my dataset,zip in desktop (windows) and Im running the cygwin64 to connect to aws
After I ssh to aws

I have something like:

scp -i ~/.ssh/aws-key.pem ~/Desktop/Dataset.zip ubuntu@ec2-xx-xxx-xxx-xxx:nbs/

And then im getting the following error:

Warning: Identity file /home/ubuntu/.ssh/aws-key.pem not accessible: No such file or directory.
ssh: Could not resolve hostname ec2-xx-xxx-xxx-xxx: Name or service not known
lost connection

I think there are 2 problems. The first is that the .pem file not accessible. From my understanding, i followed the video and lesson 1 very closely, and my understanding is the .pem file is in the /home/ubuntu/.ssh/ directory. I checked the AWS console and see the key pair is called aws-key.

I really dont know what im doing wrong

Hello,

This is from amazon’s help page:
scp -i /path/my-key-pair.pem /path/SampleFile.txt ec2-user@ec2-198-51-100-1.compute-1.amazonaws.com:~

I think the problem is that you are not providing the correct host name ec2-xx-xxx-xxx-xxx. Did you checked that in console it should be under Public IP. You need that public IP to connect. If you have successfully connected to your amazon machine with ssh and terminal it should be the same IP.

1 Like

Hello,

Since im using windows and cygwin64, is that why there is the error?
My data is on desktop on my windows machine, but i think the following command makes it look like its in the ubuntu machine?

Hi, yes I put the correct host name, I didnt write it out in this forum. But the error is still persisting

Hi,
once we are ssh, we are now in Ubuntu right? So do you know where the .pem file now is?

Better use Filezilla for data transfer to AWS server. Or you can follow the steps provided by @kcturgutlu

Yes, correct. Filezilla is a lot easier option. Thanks.

1 Like

In FileZilla, when I try adding the aws-key-fast-ai.pem file, it says:

“Command failed
Could not get reply from fzputtygen.”

I Googled for help but couldn’t find a solution. Do you know what it means?

Thanks.

I tried it a few more times, and it worked.
¯_(ツ)_/¯

1 Like