Training Data
If you want to train your own network via MeshGraphNets.jl you have to provide your data files and a corresponding metadata file in a specific manner. This section describes the structure of those files.
Folder Structure
Your files have to be placed inside the same folder so that MeshGraphNets.jl can find them. The structure of your files should for example look like this:
- data
- meta.json
- test.h5 (or test.tfrecord)
- train.h5 (or train.tfrecord)
- valid.h5 (or valid.tfrecord)
Files for Training, Evaluation and Testing
For each of the steps a separate file has to be provided (see Folder Structure). It is recommended to use HDF5
files since they are easier to work with in Julia. You can also use TFRecord
files, however an implementation is only provided for handling files that are exactly like the ones from the CylinderFlow example.
⚠️ The following sections only contain explanations for using
HDF5
files.
Data File Structure (train.h5
, valid.h5
, test.h5
)
Your data files should have the following structure:
- trajectory_1
- featurekey1
- featurekey2
- ...
- trajectory_2
- featurekey1
- featurekey2
- ...
- ...
The name of the trajectories can be chosen arbitrarily whereas the keys of the features have to match the ones provided in the metadata file (see File for Metadata).
File for Metadata (meta.json
)
Your metadata file also has to follow a defined structure. Since the metadata file for the CylinderFlow example is handled differently, two files are explained in the following.
Default Metadata
The default structure that you should use for your metadata is the following (example derived from the CylinderFlow metadata, not an actual metadata file):
{
"dt": "time", # key inside the HDF5 file for timesteps
"trajectory_length": 600, # length of trajectories (i.e. number of steps) inside data files
"dims": [ # dimensions of the mesh (here a mesh of dimensions (5, 3))
5,
3
],
"edges": { # minimum and maximum value of edge features (optional)
"data_min": -0.2,
"data_max": 0.2
},
"no_edges_node_types": [], # node types that should not have a connection inside the mesh (optional)
"feature_names": [ # names of all features, mesh_pos and node_type are required
"mesh_pos",
"node_type",
"velocity"
],
"target_features": [ # names of target features (i.e. quantities of interest) as output of the network
"velocity"
],
"features": { # detailed information of the features given above
"mesh_pos": { # name of the feature given above
"key": "cl_mesh[%d,%d].pos", # key inside the HDF5 file for the feature, see below
"split": true, # true if your feature is split between multiple keys, false otherwise
"dim": 2, # dimension of the feature
"type": "static", # "static" if the feature does not change over time, "dynamic" otherwise
"dtype": "float32" # data type of the feature
},
"node_type": {
"key": "cl_mesh[%d,%d].cellType",
"dim": 1,
"type": "static",
"dtype": "int32",
"onehot": true, # should the feature be represented as a onehot vector (optional)
"data_min": 0, # minimum value of the feature (optional, required if "data_max" specified)
"data_max": 6 # maximum value of the feature (optional, required if "data_min" specified)
},
"velocity": {
"key": "cl_mesh[%d,%d].velocity",
"dim": 2,
"type": "dynamic",
"dtype": "float32",
"output_min": -0.25, # minimum value of the derivative of the feature (optional, required if "data_max" specified)
"output_max": 0.75 # maximum value of the derivative of the feature (optional, required if "data_min" specified)
}
}
}
Here is a detailed description of each possible metadata:
Metadata | Data type | Description |
---|---|---|
"dt" | String | each trajectory needs to have an entry for timesteps with the given key |
"trajectory_length" | Integer | each trajectory needs to have the same length i.e. the same amount of steps |
"dims" | Vector{Integer} | dimensions can be 1-, 2- or 3-dimensional |
"edges" | - | if you specify "data_min" and "data_max" , offline normalization is used, online otherwise |
"no_edges_node_types" | Vector{Integer} | if you want to exclude node types from the mesh, include them here |
"feature_names" | Vector{String} | list all features that are also used as an input of the network |
"target_features" | Vector{String} | list all features that the network should predict, they have to be part of "feature_names" |
Each feature has its own metadata:
Feature Metadata | Data Type | Description |
---|---|---|
"key" | String | further description of HDF5 key structure are below |
"split" | Bool | keys are split at the end (e.g. "cl_mesh[%d,%d].pos[1]" and "cl_mesh[%d,%d].pos[2]" ) |
"dim" | Integer | dimension of the feature |
"type" | String | "static" if the feature does not change over time, "dynamic" otherwise |
"dtype" | String | possible datatypes: "int32" , "float32" , "Bool" |
"onehot" | Bool | can be used if you want to convert types represented as Integer to a onehot vector |
"data_min" | Float | if you specify "data_min" and "data_max" , offline normalization is used, online otherwise |
"data_max" | Float | see "data_min" |
"target_min" | Float | equivalent to "data_min" and "data_max" , specifies interval for normalization target |
"target_max" | Float | see "target_min" |
"output_min" | Float | equivalent to "data_min" and data_max , specifices the interval of the derivative of the feature |
"output_max" | Float | see "output_min" |
The structure for the HDF5
key has to follow one rule:
⚠️ Square brackets are exlusively used
- once for the index of the mesh point (e.g.
"cl_mesh[%d,%d].cellType"
) and- once at the end of the key if the feature
"split"
is set totrue
.
CylinderFlow Metadata
The metadata file (taken from the from CylinderFlow example) has the following structure:
{
"dt": 0.01, # time delta between steps in the data
"trajectory_length": 600, # length of trajectories (i.e. number of steps) inside data files
"n_trajectories": 1000, # number of trajectories inside train.h5
"n_trajectories_valid": 100, # number of trajectories inside valid.h5
"dims": 2, # dimension of the mesh
"feature_names": [ # names of all features, mesh_pos and node_type are required
"cells",
"mesh_pos",
"node_type",
"velocity"
],
"target_features": [ # names of target features (i.e. quantities of interest) as output of the network
"velocity"
],
"features": { # detailed information of the features given above
"cells": { # name of the feature given above
"type": "static", # "static" if the feature does not change over time, "dynamic" otherwise
"dim": 3, # dimension of the feature
"shape": [ # individual dimensions of the feature, one dimension can be inferred from data via -1
1,
-1,
3
],
"dtype": "int32" # data type of the feature
},
"mesh_pos": {
"type": "static",
"dim": 2,
"shape": [
1,
-1,
2
],
"dtype": "float32"
},
"node_type": {
"type": "static",
"dim": 1,
"shape": [
1,
-1,
1
],
"dtype": "int32",
"onehot": true, # should the feature be represented as a onehot vector (optional)
"data_min": 0, # minimum value of the feature (optional, required if "data_max" specified)
"data_max": 6 # maximum value of the feature (optional, required if "data_min" specified)
},
"velocity": {
"type": "dynamic",
"dim": 2,
"shape": [
600,
-1,
2
],
"dtype": "float32"
}
}
}