Genetic Algorithm Recommender

Matheus Schmitz
LinkedIn
Github Portfolio

Dataset: Fermentation Lab Experiment

A CSV file containing empirical data for a fermentation lab experiment (e.g. yogurt).

The variables of each experiment are:

The post-fermentation measurements of each experiment are:

Challenge

For this challenge, we want to leverage the fermentation lab experiment dataset to implement a function that can suggest optimal experimental parameters to yield desired output results. In other words, suggest what experiment a scientist would have to conduct in order for the experiment to yield a specific fermentation time and coagulation quality:

suggest_experimental_parameters(
desired_fermentation_time=83,
desired_coagulation_quality=1.22)

{
protein_quantity: 0.22,
starch_quantity: 0.41,
probiotic_quantity: 0.35
water_quantity: 0.16,
starting_ph: 6.3
} # just an example

Hint: How certain are you that the suggestions would yield the desired results? Is there more than one likely experiment that would yield the desired post-fermentation properties? Keep in mind the explanatory versus response variables of your model.

Phase 1: Data Exploration

Imports

Load Data

Phase 2: Solution Design

Imports

Load Data

Machine Learning

Models which learn to predict the properties resulting from a given set of experimental parameters.

They are used later on as part of the descriminator system, so that the members of each generation can be chosen based on how well they approximate the desired properties.

Fermentation Time

Coagulation Quality

Genetic Algorithm

Generator: Genetic Algorithm
Discriminator: Fitness Function

We note that features 1-4 are percentages, and hence I apply a modification to the genetic algorithm so that it's mutations are all represented as relative changes to the percentages, such that the sum of those initial four features is always equals to one. This can be thought of as applying a Softmax activation to those features.

Additionally, since the classifier models are only trained to predict features on the range seen in the training data, I bound the possible features mutations within the Genetic Algorithm to be within the range seen for each feature at training time.

Hyperparameter Tuning

Use the training data to see which set of hyperparameters result in better approximations for the desired properties.

For each sample the loss (fitness function) takes the form of the difference between a sample's predicted properties and the target properties.

Recommendation Engine

Using the trained ML model and the optimized Genetic Algorithm, we can then generate a set of suggested experimental parameters using the last generation's population.

Usage Example

End

Matheus Schmitz
LinkedIn
Github Portfolio