Another treat! Early access to Intro To Machine Learning videos


(Nick) #692

pip does not install graphviz executable, you should download it yourself from https://www.graphviz.org/download/ or use conda conda install -c anaconda graphviz


#693

Here is an attempt at waterfall plots with plotnine the ipynb codes cells follow.
This is still a work in progress any comments welcome

%load_ext autoreload
%autoreload 2

%matplotlib inline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from plotnine import *

b0 = pd.DataFrame({'desc': ['sales','returns','credit fees','rebates','late charges','shipping'],
        'amount': [350000,-30000,-7500,-25000,95000,-7000]})

def comma(x):
    'The two args are the value '
    if len(x) >1:
        res = []
        for el in x:
            res.append("{:,.0f}".format(el))
    else:
        res = "{:,.0f}".format(x)
    
    return res


def waterfall_df(balance):
    """
    Expects a two column named 'amount' and  'desc' data frame
    """
    balance.desc = pd.Categorical(balance.desc, categories=balance.desc)
    balance['types'] = ["increase" if v > 0 else "decrease" for v in balance.amount]
    total =  balance.amount.sum()
    balance = balance.append({'amount':total, 'desc':'net', 'types':'net'} , ignore_index=True)
    balance  = pd.concat([balance,pd.Series([v for v in range(balance.shape[0])])], axis=1 )
    cols = balance.columns.values
    cols[-1] = 'ind'

    #print(cols, type(cols), balance.types.unique())
    balance.columns = cols
    #print(balance.amount.cumsum())
    balance.types = pd.Categorical(balance.types, categories=['decrease', 'increase', 'net']) #balance.types.unique())
    balance.iloc[0, len(cols) -2] = "net"
    csum = balance.amount.cumsum()
    zero_s = pd.Series([0.0],index=[len(csum)-1])

    balance['end'] = csum[0:len(csum)-1].append(zero_s)
    balance['start'] = csum[0:len(csum)].shift(1).fillna(0)
    cmap = [ '#d83000' if v < 0 else '#242b73' for v in balance['amount']]
    balance['cmap'] = cmap   

    return  balance

def waterfall_plot(balance):
    ind = balance.ind.values
    end = balance.end.values
    start = balance.start.values
    end_lbl = comma(end)
    start_lbl = comma(start)
    nudge_end = [1 if e < s else -0.3 for e, s in zip(end,start)]
    nudge_start = [-0.3 if e < s else 1 for e, s in zip(end,start)]
    black = '#222222'
    y_min = balance.end.values.min()
    y_max = balance.end.values.max() + (0.2 * balance.end.values.max())

    p1 = (ggplot(balance, aes('ind', fill = 'types')) + 
          geom_rect(aes(x = 'ind',xmin = ind - 0.45, xmax = ind + 0.45, ymin = end,ymax = start)) +
          xlab("") + 
          ylab("") + 
          theme_seaborn() ) #+
          #theme( 
          #    axis_text = element_text(balance.desc, color='#555555', size=8, angle=45, va='bottom', margin={'t':10,'b':10})))
          #    axis_text_x=element_text(color=black)))
    for s, e, i, t , a in zip(balance.start, balance.end, balance.ind, balance.types, balance.amount):
        if t == 'increase' :
            p1 = p1 + geom_text( 
                            aes(x=i,y=e, label = a, nudge_y = 1), va='bottom', size = 8,format_string="{:,.0f}")
        elif (t=='net') & (e > 0):
            p1 = p1 + geom_text( 
                            aes(x=i,y=e,label = a, nudge_y=nudge_end[0] ),  va='bottom',  size = 8, format_string="{:,.0f}") 
        elif (t=='net') & (s > 0):
            p1 = p1 + geom_text(
                            aes(x=i,y=s, label = a, nudge_y = nudge_start[len(nudge_start)-1]), 
                            va='bottom', size = 8,format_string="{:,.0f}")

        elif t=='decrease':
            p1 = p1 + geom_text( 
                            aes(x=i,y=e, label = a, nudge_y = -0.3),  va='top', size = 8,format_string="{:,.0f}")
            
    p1 = p1 + geom_label(aes(y=y_max,label='desc'), color=black, size=8, angle=20, va='center')
    #p1 = p1 + scale_fill_manual(values = [('decrease', "indianred"),('increase' ,"forestgreen"), ('net', "dodgerblue2")])
    return p1

waterfall_plot(waterfall_df(b0))

Screenshot%20from%202018-07-01%2009-05-16

try it on your data


(sashank) #694

Are these videos enough to say we can start working on Machine Learning Models in real world ? Cna you please help me on it .


(PRATIKSHA) #695

What is artificial intelligence?


(Kaushik Perika) #696

Sir , cannot thank you enough :pray::pray:


(Kofi Asiedu Brempong) #697

we could each create an initial model and crosscheck to see what we can learn from each other


(sid) #698

should we choose a different dataset as houseprices has only 1461 samples for training ?
you could also email me at my username @ hotmail.com


(Kofi Asiedu Brempong) #699

sure …
Do you have any suggestions regarding a different dataset


(Kofi Asiedu Brempong) #700

Hi all, i tried building an image classifier based on lesson 1 of part 1.
I wrote my first medium post based on the results i had, please check it out and let me know your thoughts


(sashank) #701

I am getting an error that the kernel died when i execute the below code in lesson1 . Cna anyone please help me .

df, y, nas = proc_df(df_raw, ‘SalePrice’)


(Utkarsh Mishra) #702

Can anyone help me with GBM and XGBOOST ?
Any lecture series or youtube videos ?


(Leo Yu Ho, Lo) #703

I had presented the “Ethics and Data Science” materials to my research group today, a 30 people data visualization group at HKUST! It was an underdiscussed issue, but not anymore! Thank you, @jeremy and @rachel !

Here is my slides, copied and annotated the original course slides on Github.


(Sumit) #705

Hi,

In lesson4-mnist_sgd, I’m facing the below issue

image

Can anyone help me?

The earlier post was messed up with the reply to @sashank’s post !!

Thanks,
Sumit


(Sumit) #706

@sashankpappu

Have you tried restarting the notebook and run the code

again !!

Sumit


(Erick Giffoni) #707

Hi, people. Can someone help me on lesson 2 ? [Workbook 1]

I got this when running : 

Captura de Tela 2018-07-13 às 12.16.17


(Erick Giffoni) #708

and

Captura de Tela 2018-07-13 às 12.16.36


(Sumit) #709

@Erick_Giffoni

Please run this conda install -c anaconda graphviz .
Let me know if it doesn’t work.


(Sumit) #710

@Jdemlow

I’m facing the same issue but couldn’t able to solve it.
After looking into forums i ran this conda install -c defaults intel-openmp -f but nothing happend & also i don’t know what’s the significance of it.

Have you solved it? Can you please help me.

Thanks,
Sumit


(sashank) #711

yes , its the same … I later realized that feature logic is not working in my system now sure why .


(Sumit) #712

Can you tell what’s the shape of df and y?
& what’s showing after you run proc_df , I mean any error or anything which will be helpful to figure what’s wrong ?

and also if at all df & y has values then can you post first 4-5 rows in here .

Thanks,
Sumit