Machine Learning Fundamentals in Python

Machine Learning Fundamentals in Python

1 out of 15

A GradientBoostingClassifier model is fitted on the training data X_train and y_train, and stored as model. Use model and the X_test feature data to predict values for the response variable, and store it in y_pred.

Complete the code to return the output

from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score

model = GradientBoostingClassifier(n_estimators=300, max_depth=1, random_state=1)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(accuracy_score(y_test, y_pred))

Expected Output

0.95

Your Output

0.95

2 out of 15

Fill in the blanks

You are supplied with 2 sets of variables; y_pred are predicted using the model supplied and y_test are the actual response values.

Print the main classification metrics for the model.

Complete the code to return the output

from sklearn.tree import DecisionTreeClassifier
from sklearn import metrics

model = DecisionTreeClassifier(max_depth=4, random_state=42)

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print(metrics.classification_report(y_test, y_pred ))

Expected Output

              precision    recall  f1-score   support

           0       0.93      0.94      0.94        54
           1       0.97      0.96      0.96        89

    accuracy                           0.95       143
   macro avg       0.95      0.95      0.95       143
weighted avg       0.95      0.95      0.95       143

Your Output

              precision    recall  f1-score   support

           0       0.93      0.94      0.94        54
           1       0.97      0.96      0.96        89

    accuracy                           0.95       143
   macro avg       0.95      0.95      0.95       143
weighted avg       0.95      0.95      0.95       143

3 out of 15

Fill in the blanks

To understand the impact of weekly income on the amount a family spends each week on groceries, fit a suitable model using the weekly_spend data. What is the value of the intercept of the model?

   spend  income  children  car
0     20   19549         0    0
1     52   95248         1    1
2     18   27693         0    1
3     37   50788         1    1
4     46   50312         0    1

Complete the code to return the output

import numpy as np
from sklearn.linear_model import LinearRegression

X=np.array(weekly_spend['income']).reshape(-1,1)

mod=LinearRegression()
mod=LinearRegression(fit_intercept=True)

mod.fit(X, y)
mod.fit(X, weekly_spend["spend"])

print(mod.intercept_.round(2))

Expected Output

14.39

Your Output

Traceback (most recent call last):
  File "<stdin>", line 8, in <module>
    mod.fit(X, y)
NameError: name 'y' is not defined

4 out of 15

A random forest model has been fitted to train data.

Use the results from a random forest classifier to determine the importance of each feature in determining whether a patient does or does not have heart disease.

Complete the code to return the output

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(n_estimators=15,
                               random_state=1)
model.fit(X_train, y_train)

# Create a DataFrame with the feature importances
feature_importances = pd.DataFrame(
    {"feature": list(X.columns), "importance": model.feature_importances_}
).sort_values("importance", ascending=False)

sns.barplot(data=feature_importances, x="importance", y="feature")
plt.show()

Expected Output

graph

5 out of 15

Available in your working session is the dataset scaled_samples. Instantiate a principal component analysis model object with 2 components, and fit the model to the scaled_samples object.

Complete the code to return the output

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
pca.fit(scaled_samples)

pca_features = pca.transform(scaled_samples)
print(pca_features.shape)

Expected Output

(85, 2)

6 out of 15

As part of a brainstorming session for a medical startup idea that predicts the chance of diabetes based on a few key readings, you have a sample DataFrame named df with key readings for diabetes:

   Glucose  BloodPressure  Insulin
0      148             72        0
1       85             66        0
2      183             64        0
3       89             66       94
4       78             50      168

Apply Min-Max scaling so that all numeric columns are between 0 and 1.

Complete the code to return the output

from sklearn.preprocessing import MinMaxScaler

min_max = MinMaxScaler()
df_scaled = pd.DataFrame(min_max.fit_transform(df), columns=df.columns)
print(df_scaled.head())

Expected Output

   BloodPressure   Glucose   Insulin
0       1.000000  0.666667  0.000000
1       0.727273  0.066667  0.000000
2       0.636364  1.000000  0.000000
3       0.727273  0.104762  0.559524
4       0.000000  0.000000  1.000000

7 out of 15

Create and fit random forest classifier with 15 trees to predict whether a patient has or does not have heart disease. The data has already been split into train and test sets (X_train, X_test, y_train, y_test).

Complete the code to return the output

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

model = RandomForestClassifier(n_estimators=15,
                               random_state=1)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(accuracy_score(y_test, y_pred))

Expected Output

0.8032786885245902

8 out of 15

The scatterplot shows data for a sample of 14 biofuels. Fit a linear regression model and print the intercept and coefficient.

Complete the code to return the output

from sklearn.linear_model import LinearRegression

model = LinearRegression(fit_intercept=True)

model.fit(x, y)

print("Regression coefficients: {}".format(model.coef_))
print("Regression intercept: {}".format(model.intercept_))

Expected Output

Regression coefficients: [[-0.20938742]]
Regression intercept: [75.21243193]

9 out of 15

Consider the first five rows of the data frame df shown below. Apply a pre-processing step to standardize all numeric features.

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

Complete the code to return the output

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
print(df_scaled.head())

Expected Output

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0          -0.900681          1.019004          -1.340227         -1.315444
1          -1.143017         -0.131979          -1.340227         -1.315444
2          -1.385353          0.328414          -1.397064         -1.315444
3          -1.506521          0.098217          -1.283389         -1.315444
4          -1.021849          1.249201          -1.340227         -1.315444

10 out of 15

Using the original data, df, training (X_train, y_train) and test (X_test, y_test) sets have been created. Complete the code using the data sets in the appropriate places.

Complete the code to return the output

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

lin_reg = LinearRegression()
lin_reg.fit(X_train,y_train)
predictions = lin_reg.predict(X_test)
print("Mean squared error: %.2f" % mean_squared_error(y_test, predictions))

Expected Output

Mean squared error: 8.47

11 out of 15

A dataset has been prepared for you and split into test and training sets (X_train, X_test, y_train, y_test).

Use sklearn to fit a classification gradient boosting model on the training data with 300 estimators and 0.01 learning rate

Complete the code to return the output

from sklearn.ensemble import GradientBoostingClassifier

model = GradientBoostingClassifier(n_estimators=300, learning_rate=0.01, random_state=42)
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

print(model.score(X_test, y_test))

Expected Output

0.8888888888888888

12 out of 15

A dataset has been prepared for you and fed into a random forest model.

Use sklearn to show the predicted probabilities of a new data point belonging to each class.

>>> print(new)
Alcohol    Malic.acid  Phenols Flavanoids
13.64      3.10        2.70    3.01

Complete the code to return the output

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42)

model.fit(X_train, y_train)

print(model.predict_proba(new))

Expected Output

[[0. 1. 0.]]

13 out of 15

Consider the data frame df below that shows the total number of observations per month. Fit a suitable imputer to fill the missing values.

Date         Ozone Solar Wind
1976-05-31    26      27      31
1976-06-30    9       30      30
1976-07-31    26      31      31
1976-08-31    26      28      31
1976-09-30    29      30      31

Complete the code to return the output

from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='median')

print(imputer.fit(df))

Expected Output

SimpleImputer(add_indicator=False, copy=True, fill_value=None,
              missing_values=nan, strategy='median', verbose=0)

14 out of 15

A linear regression model has been fitted to X_train with respect to y_train.

Test values are provided in the arrays X_test and y_test.

Diagnose potential problems by plotting the residuals against the fitted values, including a lowess smoother.

Complete the code to return the output

from matplotlib import pyplot as plt
import seaborn as sns

model = LinearRegression().fit(X_train,y_train)

y_fitted = model.predict(X_test)

sns.residplot(y_test - y_fitted, y_fitted)


plt.show()

Expected Output

graph

15 out of 15

Consider the variable x in the Pandas DataFrame df shown in the plot below. Note that the data contains positive and negative values. Apply a suitable transformation to the data.

Complete the code to return the output

from sklearn.preprocessing import PowerTransformer

log = PowerTransformer(method='yeo-johnson')
df['log_x'] = log.fit_transform(df[['x']])
print(df['log_x'].head())

Expected Output

0   -0.319518
1    1.714791
2    0.823256
3    0.414669
4   -1.054036
Name: log_x, dtype: float64

Retake Assessment

Machine Learning Fundamentals in Python

ScoreCompareApr 23rd, 20212:48pmJul 4th, 20213:53pmNoviceIntermediateAdvanced+29 overall134

Your score increased by 29 overall. You started with a score of 105 and just measured 134. CongratulationsReview Your Answers

Share Results

Show off your skills and challenge coworkers and friends to do better.LinkedInFacebookTwitter

Knowledge Summary

Your strengths and skill gaps are based on how you performed within each subskill in the assessment.

Strengths

Skill Gaps