Saturday May 27th 2017

Categories

Insider

Archives

Select Features and Target in Scikit Learn

To do Machine Learning in SKlearn, as a first step we need to import following
import pandas as pd
import numpy as np
Step 1. We read the file in Panadas Dataframe by
pd.read_csv.
In jupyter Note book we defined dataframe as
df=pd.read_csv(‘C:\Data\glass.csv’)
Pic 1 Data impoort in SKLearn

In order to select features and target for machine learning we will use the following commands

X=df[list(df.columns)[:-1]] input
y = df[‘Type’]
X is (Features)
Y is (Target)
By using above command X=df[list(df.columns)[:-1]] we removed the Type Column from input features and then used the y = df[‘Type’] as (Target).
If check X by running X.info() we will have the following columns

Int64Index: 214 entries, 0 to 213
Data columns (total 9 columns):
RI 214 non-null float64
Na 214 non-null float64
Mg 214 non-null float64
Al 214 non-null float64
Si 214 non-null float64
K 214 non-null float64
Ca 214 non-null float64
Ba 214 non-null float64
Fe 214 non-null float64
dtypes: float64(9)
memory usage: 16.7 KB

you will notice ‘TYPE’ column which is target variable is not shown below as we used the [:-1] which removed the last column which was target column. This is very useful command if we enter -2 then last 2 columns will be removed and so on.;