Guide to download the French Amazon Customer Reviews
Read information page and license about Amazon Customer Reviews Dataset.
-
Create an AWS Free Tier account.
-
Login to your AWS account to the IAM console with the login/password of step 1.
-
Create en IAM Admin User and Group by following theses rules.
-
Create your IAM user access keys (access key ID and secret access key) by following theses rules. DO NOT FORGET to save your 2 keys.
-
Install the AWS Command Line Interface (aws cli) in an ubuntu terminal on your computer by following theses rules.
-
Configure your aws cli by following theses rules.
-
With you aws cli, you can list the available reviews datasets in the bucket with the
ls
command by typing the following code in your ubuntu terminal:
aws s3 ls s3://amazon-reviews-pds/tsv/
List (2017-11-24):
amazon_reviews_multilingual_DE_v1_00.tsv.gz
amazon_reviews_multilingual_FR_v1_00.tsv.gz
amazon_reviews_multilingual_JP_v1_00.tsv.gz
amazon_reviews_multilingual_UK_v1_00.tsv.gz
amazon_reviews_multilingual_US_v1_00.tsv.gz -
To download data using the aws cli, you can use the
cp
command. For instance, the following command will copy the file named amazon_reviews_multilingual_FR_v1_00.tsv to your local data folder:
cd path_to_your_data_folder
aws s3 cp s3://amazon-reviews-pds/tsv/amazon_reviews_multilingual_FR_v1_00.tsv .
-
Unzip your file:
gzip -d amazon_reviews_multilingual_FR_v1_00.tsv.gz
-
In your jupyter notebook, open your tsv file with pandas with for example the following code (see list of columns names):
fields = ['review_id', 'review_body', 'star_rating'] df = pd.read_csv(path_data/'amazon_reviews_multilingual_FR_v1_00.tsv', delimiter='\t',encoding='utf-8', usecols=fields) df = df[fields] df.loc[pd.isna(df.review_body),'review_body']='NA' df.head()
That’s it. You can start fine-tuning your LM model and then fine-tuning your classifier with the French Amazon Customer Reviews by using the ULMFiT method implemented in the nn-vietnamese.ipynb notebook. Have fun and please, publish your results. Thanks