@farlion I might actually use this. Thanks.
To confirm, the assumption of this gist is that I have images denoted as a column, and multiple classes as columns in a CSV? Is it ok if the names of classes are the same even if the column is different (Left vs. Right orientations for example).
Can you tell me what license to use for this code? I try to only use code that has Apache or more liberal.
Looking that example, this might not be what I am looking for. Specifically I have data in the form of something like:
ImageName, Type, Color
I001, Bike , Blue
I002, Pedestrian, Red
Are probabilities explicitly expected by this, or can it be a specific class? If not my plan was to change my representation to:
ID, Values
I001, Type:Bike Color:Blue
Unless you have any suggestions. I think this is in the direction of what I am looking for, but maybe not.
This is perfect thanks! I love that license. I just like clarifying these things where possible. =)
Hmm, interesting! This is definitely different from what I was doing (predicting independent exact probabilities for a number of fixed classes).
I’m by no means an expert on what you’re trying to do, but if you have a small number of categories per categorical variable (i.e. only a few types and colors) then your approach to turn it into “standard” binary multi-label classification (again not sure this is idiomatic jargon) looks reasonable.
Are these independent things you are trying to predict (i.e. type does not affect color in any way) for the same image, or could there be some underlying relationships (bikes are more likely to be blue than pedestrians are)?
That was my thought as well. I think I struggle with the fact that the data has enough columns that I had syntax concerns and was hoping that maybe I could adapt what you did to have it so I could read the CSV with multiple columns rather than try to make one very large clunky one.
That is where things get complicated. To be more specific, my database is describe aspects of streets (not aspects that AV folks care about). So I think generally they can be treated as independent, but there are some exceptions. For example, a bike lanes buffer might only appear if there is a bike lane, etc. For now, I want to say an assumption of independence is acceptable, but not perfect for all attribute.
Would be curious what you think might be a good approach or way to format the data.