Share your work here ✅

Johnpal · December 4, 2019, 4:19pm

I want to share my second mini project from the 4th lesson.

I built a phishing classifier using fast.ai tabular data and the following dataset: https://data.mendeley.com/datasets/h3cgnj8hft/1

The dataset contains 48 features extracted from 5000 phishing webpages and 5000 legitimate webpages.

I obtained 98% of accuracy, outperforming benchmarks obtained with traditional ML algorithms used for phishing detection like Random Forest, SVM. For instance, the related paper to the dataset says: " The overall experimental results suggest that HEFS performs best when it is integrated with Random Forest classifier, where the baseline features correctly distinguish 94.6% of phishing and legitimate websites using only 20.8% of the original features." https://www.sciencedirect.com/science/article/pii/S0020025519300763#ec-research-data

So, like @jeremy said in the 4th lesson: "It’s not true that neural nets are not useful for tabular data ,in fact they are extremely useful. "

I do really appreciate the work that Jeremy, Rachel and folks from the fast.ai team are doing to bring AI for all!

Update: Here is the notebook: https://github.com/johnagr/Phishing-Classifier-