Non-Beginner Discussion

init_27 · April 26, 2022, 7:27am

Hi All,

Kindly use this topic for any non-beginner discussions related to lesson 1 livestream

Please also remember to take some time to answer other’s questions in the lesson 1 topic

Thank you!

Edit from Jeremy: feel free to discuss more advanced topics of any lesson in this thread

mindtrinket · April 26, 2022, 12:12pm

Should we use this thread to suggest projects/papers we are interested in? Having a project always helped me in the past.

I am looking to find something cybersecurity-focused this time. I am considering how I might be able to best handle access logs behavior to drive recommendations.

rsrivastava · April 26, 2022, 4:38pm

I have experience is Cyber Security using random forest etc I am interested in using deep learning. If you want we can create a group and work on the project.

mindtrinket · April 26, 2022, 11:10pm

We can. I was going to see if I can re-create this paper. Applying NLP techniques to malware detection in a practical environment | SpringerLink

jlobo · April 26, 2022, 11:46pm

I have knowledge in cryptography, and I was wondering if I could treain an image model to calculate the sha256 hash. Its is just an idea.

Danrohn · April 28, 2022, 1:57pm

Hey guys,
How do I use my model to predict larger image files?

Let’s say that I trained my model on 256x256 images. What commands do I use or how do I predict my model’s results on new external images that are of size 4096x4096?

Will the results apply to them as well?

Thanks

rsrivastava · April 28, 2022, 2:17pm

Let me read the paper… let us discuss what we fine. I just want any deep learning approach…

rsrivastava · April 28, 2022, 2:17pm

What images and how to get those images.

rsrivastava · April 28, 2022, 6:43pm

Hi @mindtrinket I got chance to look at the paper at a very high level.

Goal: Detect malware from executable by extracting feature of malware binaries and portable executable headers.

Idea: Use NLP to detect malware on time series dataset. Analyze PE header to distinguish betweek packeted and non packed executables.

Their Approach:
STEP1: Extact all ascii strings from malicious and benign sets.
Convert word into vector
STEP2: Sort words based on frequency, langauge model build on words.
STEP3: Doc2vec model used on frequent words. And LSI is constructed using TF-IDF scores.
Apply different models
STEP4: Apply RF, XGB, MLP,CNN etc.

Fastai Approach:

Extact all ascii strings from malicious and benign sets. Create labelled set.
Use Langauge model for prediction.

Question: How to get malware binaries data with PE headers?

jlobo · April 29, 2022, 1:13am

Any image can do, either ImageNet or Imagenette and the labels are just the hash of the images.
the idea is to build a “neural hash” that preserves properties like a secure hash function, such as the uniform distribution of its output.

mindtrinket · April 29, 2022, 4:01pm

That was my take.

They used something beyond strings with GitHub - FFRI/ffridataset-scripts: Make datasets like FFRI Dataset. Which is very interesting.

I think I will work on setting up a lambda function to grab some malicious executables online and start by just running Strings . I like this list of malicious executables from Free Malware Sample Sources for Researchers. In particular, I like vx-underground because it could lead to an interesting classification problem down the line.

Then if we grab 50 malware samples and 50 normal executables we can see if a toy problem works.

Another thing I was considering, many of the language models were done around a language not assembly code. So starting with a pretrained model would be… problematic.

mindtrinket · April 29, 2022, 4:10pm

Great question, I think you will need to adjust the layers at some point in the future (fast.ai does some of this heavy lifting). We see this in some of the lessons where we start from smaller-sized images in transforms and move them up in size. After all, you are going from 65K pixels to 16,772K pixels.

Which problem are you looking into where you would need to see all of the pixels? Many use cases can be reduced to speed up training without a substantial decrease in accuracy.

Danrohn · April 29, 2022, 8:30pm

Thank you mate!
I’d say: I’ve trained my model on a dataset of 256x256 sized images. The dark images are trained to become as bright as the ground truth. Great results. I saved the model. Now I want to predict another bright image by driving another dark photo (of size 4096x4096) and see how it manages to perform then.
I’ve saved my model name on the format of .pkl. Now what’s next?
I tried load_learner and predict, but it seems to distort the larger image by first resizing it into 256x256 and apply its predicting only then.

Danrohn · April 29, 2022, 9:02pm

Another question:
Let’s say that I want to make photos look even more fully detailed (like, make them super-resolution), can I keep training another dataset on the pre-trained model now? Is there any toturial to combine two different models?

mindtrinket · April 29, 2022, 9:15pm

post code? Something sounds off

Danrohn · April 29, 2022, 9:25pm

What’s out of the track?

KevinB · April 30, 2022, 3:55am

This felt like a good place to post my docker-compose.yml code I’m using. I started with the fastai docker-compose.yml but I wanted to be able to train a model so I ended up scrapping all of the documentation pieces but might bring them back at some point.

version: "3"
services:
  fastai-notebook:
    restart: unless-stopped
    working_dir: /data
    # image: fastai/codespaces
    image: pytorch/pytorch:1.9.1-cuda11.1-cudnn8-runtime
    shm_size: 16gb
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    logging:
      driver: json-file
      options:
        max-size: 50m
    stdin_open: true
    tty: true
    volumes:
      - .:/data/
    environment:
      - LIB_INSTALL_TYPE=. #optionally change this locally to .[dev] to install dev packages as well
    command: bash -c "pip install jupyter && pip install -e $$LIB_INSTALL_TYPE && jupyter notebook --allow-root --no-browser --ip=0.0.0.0 --port=8080 --NotebookApp.token='' --NotebookApp.password=''"
    ports:
      - "8080:8080"

topj · May 2, 2022, 6:34am

Hi James,
Please add me to the list of people interested in applications of deep learning to cyber security.
Thanks!

Danrohn · May 2, 2022, 10:16am

Hey Anyone,
any idea of how to create a time-lapse of the predicted images?

I want to save a prediction batch every epoch, to create a time-lapse video of the images becoming more and more trained.

jeremy · May 2, 2022, 10:26am

You could use a Learner callback for that.