Should we use this thread to suggest projects/papers we are interested in? Having a project always helped me in the past.
I am looking to find something cybersecurity-focused this time. I am considering how I might be able to best handle access logs behavior to drive recommendations.
I have experience is Cyber Security using random forest etc I am interested in using deep learning. If you want we can create a group and work on the project.
Hey guys,
How do I use my model to predict larger image files?
Let’s say that I trained my model on 256x256 images. What commands do I use or how do I predict my model’s results on new external images that are of size 4096x4096?
Hi @mindtrinket I got chance to look at the paper at a very high level.
Goal: Detect malware from executable by extracting feature of malware binaries and portable executable headers.
Idea: Use NLP to detect malware on time series dataset. Analyze PE header to distinguish betweek packeted and non packed executables.
Their Approach:
STEP1: Extact all ascii strings from malicious and benign sets.
Convert word into vector
STEP2: Sort words based on frequency, langauge model build on words.
STEP3: Doc2vec model used on frequent words. And LSI is constructed using TF-IDF scores.
Apply different models
STEP4: Apply RF, XGB, MLP,CNN etc.
Fastai Approach:
Extact all ascii strings from malicious and benign sets. Create labelled set.
Use Langauge model for prediction.
Question: How to get malware binaries data with PE headers?
Any image can do, either ImageNet or Imagenette and the labels are just the hash of the images.
the idea is to build a “neural hash” that preserves properties like a secure hash function, such as the uniform distribution of its output.
I think I will work on setting up a lambda function to grab some malicious executables online and start by just running Strings . I like this list of malicious executables from Free Malware Sample Sources for Researchers. In particular, I like vx-underground because it could lead to an interesting classification problem down the line.
Then if we grab 50 malware samples and 50 normal executables we can see if a toy problem works.
Another thing I was considering, many of the language models were done around a language not assembly code. So starting with a pretrained model would be… problematic.
Great question, I think you will need to adjust the layers at some point in the future (fast.ai does some of this heavy lifting). We see this in some of the lessons where we start from smaller-sized images in transforms and move them up in size. After all, you are going from 65K pixels to 16,772K pixels.
Which problem are you looking into where you would need to see all of the pixels? Many use cases can be reduced to speed up training without a substantial decrease in accuracy.
Thank you mate!
I’d say: I’ve trained my model on a dataset of 256x256 sized images. The dark images are trained to become as bright as the ground truth. Great results. I saved the model. Now I want to predict another bright image by driving another dark photo (of size 4096x4096) and see how it manages to perform then.
I’ve saved my model name on the format of .pkl. Now what’s next?
I tried load_learner and predict, but it seems to distort the larger image by first resizing it into 256x256 and apply its predicting only then.
Another question:
Let’s say that I want to make photos look even more fully detailed (like, make them super-resolution), can I keep training another dataset on the pre-trained model now? Is there any toturial to combine two different models?
This felt like a good place to post my docker-compose.yml code I’m using. I started with the fastai docker-compose.yml but I wanted to be able to train a model so I ended up scrapping all of the documentation pieces but might bring them back at some point.