Machine Learning Engineering

Matheus Schmitz
LinkedIn
Github Portfolio

Task 1: Object Oriented Programming for Machine Learning

Write a Python class, using principles of object-oriented design, to wrap a logistic regression estimator with the following functionality:

1. The ability to fit on training data:

self.fit(X, y)

I/O Typing Definiion
Input X : pd.DataFrame Input features
Input y : np.ndarray Ground truth labels as a numpy array of 0-s and 1-s.
Output None

2. The ability to predict class labels on new data:

self.predict(X)

I/O Typing Definiion
Input X : pd.DataFrame Input features
Output np.ndarray Ex: np.array([1, 0, 1])

3. The ability to predict the probability of each label:

self.predict_proba(X)

I/O Typing Definiion
Input X : pd.DataFrame Input features
Output np.ndarray Ex: np.array([[0.2, 0.8], [0.9, 0.1], [0.5, 0.5]])

4. The ability to get the value of the following metrics: F1-score, LogLoss:

self.evaluate(X, y)

I/O Typing Definiion
Input X : pd.DataFrame Input features
Input y : np.ndarray Ground truth labels as a numpy array of 0-s and 1-s.
Output dict Ex: {'f1_score': 0.3, 'logloss': 0.7}

5. The ability to run K-fold cross validation to choose the best parameters:
Note: Output the average scores across all CV validation partitions and best parameters

self.tune_parameters(X, y)

I/O Typing Definiion
Input X : pd.DataFrame Input features
Input y : np.ndarray Ground truth labels as a numpy array of 0-s and 1-s.
Output dict Ex: {'tol': 0.02, 'fit_intercept': False, 'solver': 'sag', 'scores': {'f1_score': 0.3, 'logloss': 0.7}}

Task 2: Unit Testing

Please write the unit tests to check whether your model:

Data

To test your implementation, please use the attached dataset sample_dataset.csv. It contains a binary classification target for predicting loan defaults, where the column is_bad contains the target variable and all other columns contain features. Make sure to explore the dataset to think of feature corner cases which your model must handle.

Assume the data can have numeric and categorical variables. Both your training and prediction functions will take in a two-dimensional pandas DataFrame with a mixture of categorical and numeric variables.

Imports

Load Data

Design OOP ML Model

Unit Testing

End

Matheus Schmitz
LinkedIn
Github Portfolio