Kindly use this topic for any non-beginner discussions related to lesson 1 livestream
Please also remember to take some time to answer other’s questions in the lesson 1 topic
Edit from Jeremy: feel free to discuss more advanced topics of any lesson in this thread
Should we use this thread to suggest projects/papers we are interested in? Having a project always helped me in the past.
I am looking to find something cybersecurity-focused this time. I am considering how I might be able to best handle access logs behavior to drive recommendations.
I have experience is Cyber Security using random forest etc I am interested in using deep learning. If you want we can create a group and work on the project.
I have knowledge in cryptography, and I was wondering if I could treain an image model to calculate the sha256 hash. Its is just an idea.
How do I use my model to predict larger image files?
Let’s say that I trained my model on 256x256 images. What commands do I use or how do I predict my model’s results on new external images that are of size 4096x4096?
Will the results apply to them as well?
Let me read the paper… let us discuss what we fine. I just want any deep learning approach…
What images and how to get those images.
Hi @mindtrinket I got chance to look at the paper at a very high level.
Goal: Detect malware from executable by extracting feature of malware binaries and portable executable headers.
Idea: Use NLP to detect malware on time series dataset. Analyze PE header to distinguish betweek packeted and non packed executables.
STEP1: Extact all ascii strings from malicious and benign sets.
Convert word into vector
STEP2: Sort words based on frequency, langauge model build on words.
STEP3: Doc2vec model used on frequent words. And LSI is constructed using TF-IDF scores.
Apply different models
STEP4: Apply RF, XGB, MLP,CNN etc.
- Extact all ascii strings from malicious and benign sets. Create labelled set.
- Use Langauge model for prediction.
Question: How to get malware binaries data with PE headers?
Any image can do, either
ImageNet or Imagenette and the labels are just the hash of the images.
the idea is to build a “neural hash” that preserves properties like a secure hash function, such as the uniform distribution of its output.
That was my take.
They used something beyond strings with GitHub - FFRI/ffridataset-scripts: Make datasets like FFRI Dataset. Which is very interesting.
I think I will work on setting up a lambda function to grab some malicious executables online and start by just running Strings . I like this list of malicious executables from Free Malware Sample Sources for Researchers. In particular, I like vx-underground because it could lead to an interesting classification problem down the line.
Then if we grab 50 malware samples and 50 normal executables we can see if a toy problem works.
Another thing I was considering, many of the language models were done around a language not assembly code. So starting with a pretrained model would be… problematic.
Great question, I think you will need to adjust the layers at some point in the future (fast.ai does some of this heavy lifting). We see this in some of the lessons where we start from smaller-sized images in transforms and move them up in size. After all, you are going from 65K pixels to 16,772K pixels.
Which problem are you looking into where you would need to see all of the pixels? Many use cases can be reduced to speed up training without a substantial decrease in accuracy.
Thank you mate!
I’d say: I’ve trained my model on a dataset of 256x256 sized images. The dark images are trained to become as bright as the ground truth. Great results. I saved the model. Now I want to predict another bright image by driving another dark photo (of size 4096x4096) and see how it manages to perform then.
I’ve saved my model name on the format of
.pkl. Now what’s next?
predict, but it seems to distort the larger image by first resizing it into 256x256 and apply its predicting only then.
Let’s say that I want to make photos look even more fully detailed (like, make them super-resolution), can I keep training another dataset on the pre-trained model now? Is there any toturial to combine two different models?
post code? Something sounds off
This felt like a good place to post my docker-compose.yml code I’m using. I started with the fastai docker-compose.yml but I wanted to be able to train a model so I ended up scrapping all of the documentation pieces but might bring them back at some point.
# image: fastai/codespaces
- driver: nvidia
- LIB_INSTALL_TYPE=. #optionally change this locally to .[dev] to install dev packages as well
command: bash -c "pip install jupyter && pip install -e $$LIB_INSTALL_TYPE && jupyter notebook --allow-root --no-browser --ip=0.0.0.0 --port=8080 --NotebookApp.token='' --NotebookApp.password=''"
Please add me to the list of people interested in applications of deep learning to cyber security.
any idea of how to create a time-lapse of the predicted images?
I want to save a prediction batch every epoch, to create a time-lapse video of the images becoming more and more trained.
You could use a Learner callback for that.