

Machine Learning: Train Linear Regression¶

This tutorial demonstrates how to build a machine learning (ML) training model based upon a set of materials called "train materials". This model can then be used to predict the properties of another set called "target materials", based on the procedure outlined in a separate tutorial.

We consider the Electronic Band Gap in the present example, however the general approach can work for many different target properties.

Training Set¶

For the sake of the present tutorial example, we will consider the following stochiometric combinations of the elements silicon (Si) and germanium (Ge) to train our ML model for predicting the band-gap. These structures all contain a total of 16 atoms, in the form of a 2x2x2 supercell of the cubic-diamond primitive unit cell, and can be generated through the help of combinatorial sets via Materials Designer.

Si2 Ge14
Si6 Ge10
Si8 Ge8
Si10 Ge6
Si12 Ge4
Si14 Ge2

Targets¶

In this other tutorial, we explain how the model trained with the above materials can be used to predict the band-gap of another similar target composition, consisting in Si4Ge12.

Steps¶

We follow the below steps, by making use of our Web Interface.

Obtain Training Data
Build ML Train model based on the "train materials"
Inspect Trained Model

1. Obtain Training Data¶

Copy Workflow from Bank¶

The user can import a pre-assembled workflow for calculating the band-gap of materials directly from the Workflow Bank into the account-owned collection. We explain the procedure for doing so in this page.

Create and Run Job¶

Once the appropriate workflow has been copied from the Bank, we can proceed with the creation of a new Job using the Job Designer interface. We first need to select all the aforementioned materials containing Si and Ge from the account-owned materials collection, and thus add them to the job being created.

Under the Workflow Tab of Job Designer, we then need to select the band-gap workflow imported previously. At this point, the Job can be executed for the computation of the band-gap for our set of Si/Ge-based materials.

2. Build/Train a Model¶

The "ML Train Model" Workflow can be imported from the Bank into the account-owned collection by repeating the procedure outlined here.

The user should then repeat the same procedure for creating and executing a new Job as the preceding step, selecting the "ML Train Model" Workflow this time in conjunction with the Si/Ge-containing materials for which the band gap was calculated previously. This allows the Exabyte Machine Learning Engine to build the ML Train Model based upon the results of such band gap computations, which can then be used to predict the band gaps of other similar materials.

The target properties (the band gap in this case) can be selected by opening the unit editor for the "input" unit of the "ML Train Model" Workflow, and scrolling down to the "Targets" section within the editor interface.

3. Inspect Trained Model¶

Model Stored as Workflow¶

Once the ML Train Model has been built, a new Workflow called "ml_predict" is generated and can be retrieved under the results tab of job viewer for the ML train job.

This "ml_predict" workflow is automatically saved to the account-owned collection of workflows, visible through Workflow Explorer. It can subsequently be used at the moment of creation of a new Job, to predict the properties (such as the band-gap) of new materials based upon statistical considerations formed from the trained model, without consequently the need for further physics-based simulations. We explain the procedure to perform such predictions in a separate tutorial page.

Model Coefficients¶

Opening the "ml_predict" Workflow allows the user to view the "Score" unit under the corresponding unit editor interface, where the model coefficients and importance are stored, together with an indication of the model precision ¹.

Animation¶

We demonstrate the Web Interface-based procedure involved in building and then inspecting the ML Train Model in the animation below.

Links¶

Wikipedia Coefficient of determination, Website ↩