Workflow Structure¶
PythonML workflows contain two subworkflows. The first, called "Set Up the Job" performs actions such as copying data and setting environment variables necessary for the job to run. The second, called "Machine Learning," contains the actual machine learning units.
Example¶
A diagram of an example workflow can be found below, based on the bank workflow located here:
Subworkflow: Set Up the Job¶
This subworkflow facilitates setting up the PythonML job. Users should not need to edit this workflow. The configuration of this workflow is handled automatically on the when the predict workflow is generated.
Subworkflow: Machine Learning¶
This subworkflow is where a user's requested machine learning units reside. This subworkflow is generally the one that users are expected to modify, to add or remove different machine learning workflow units. In the above diagram, we find the following units:
Unit Name | Flavor | Description |
---|---|---|
Setup Packages and Variables |
pyml:setup_variables_packages |
Contains functions and configuration essential for all python-ML Workflows |
Data Input |
pyml:data_input:read_csv:pandas |
for reading in CSV data using Pandas 1 |
Data Standardize |
pyml:pre_processing:standardization:sklearn |
scales the data 2 such that it has a mean of 0 and a standard deviation of 1, as implemented in Scikit-Learn 3 |
Model Train and Predict |
pyml:model:multilayer_perceptron:sklearn |
a Multilayer Perceptron 4 implemented in Scikit-Learn 5 |
Parity Plot |
pyml:post_processing:parity_plot:matplotlib |
generates a parity plot 6 using Matplotlib 7 |