atlas.active_learning.extrapolation package

Submodules

atlas.active_learning.extrapolation.autoencoder module

Autoencoder model for dimensionality reduction and data reconstruction.

class atlas.active_learning.extrapolation.autoencoder.Autoencoder(input_dim: int, l1_dim: int, l2_dim: int, bottleneck_dim: int = 2, bias_flag=True)

Bases: Module

Autoencoder model for dimensionality reduction and data reconstruction.

This class implements a simple feedforward autoencoder model using PyTorch’s nn.Module. It consists of two main components:

  • Encoder: Compresses the input data to a lower-dimensional

    bottleneck (latent space).

  • Decoder: Reconstructs the input data from the bottleneck representation.

encoder

A neural network stack that reduces the input dimensionality down to the specified bottleneck dimension, capturing important features in a compressed form.

Type:

nn.Sequential

decoder

A neural network stack that reconstructs the input from the bottleneck representation, attempting to match the original input as closely as possible.

Type:

nn.Sequential

Parameters:
  • input_dim (int) – Dimension of the input data.

  • l1_dim (int) –

    Dimension of the first hidden layer in the encoder (and the last hidden layer

    in the decoder).

  • l2_dim (int) –

    Dimension of the second hidden layer in the encoder (and the second-to-last

    hidden layer in the decoder).

  • bottleneck_dim (int, optional, default=2) – Dimension of the bottleneck layer (latent space) where input is compressed.

  • bias_flag (bool, optional, default=True) –

    Whether to include a bias term in each linear layer. bias_flag set to false in principle is less dependant on the training data

    and is more generalizable to unseen data.

forward(x):

Performs a forward pass through the encoder and decoder, compressing the input to the bottleneck and then reconstructing it back to its

original dimension.

Example Usage:
--------------
>>> model = Autoencoder(
...     input_dim=100, l1_dim=64, l2_dim=32, bottleneck_dim=10
... )
>>> input_data = torch.randn(1, 100)
>>> reconstructed_data = model(input_data)
forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

atlas.active_learning.extrapolation.autoencoder.evaluate_reconstruction(autoencoder_model, descriptor_dict: dict, device: str | None = None, dtype=torch.float32, standardize_data: bool = False, autoencoder_path: str | Path | None = None)

Evaluates the autoencoder by computing reconstruction MAE and RMSE.

atlas.active_learning.extrapolation.autoencoder.get_latent_space_autoencoder(model, descriptor_dict: dict, device: str = None, dtype=torch.float32, quiet: bool = False, standardize_data: bool = False, autoencoder_path: str | Path = None)
atlas.active_learning.extrapolation.autoencoder.load_autoencoder_model(model_path: str, data_arr: ndarray = None)
atlas.active_learning.extrapolation.autoencoder.locate_standarization_files(autoencoder_path=PosixPath('.'))

Check for the existence of standardization files.

These files are stored as numpy arrays containing values necessary to apply same standarization scale.

Parameters:

autoencoder_path (pl.Path) – Path for the autoencoder model

Returns:

mean_vals – Array of mean values for each feature, or False if not found.

Return type:

np.ndarray or False

atlas.active_learning.extrapolation.autoencoder_tools_aiida module

AiiDA CalcJob and Parser for the autoencoder training and latent space.

class atlas.active_learning.extrapolation.autoencoder_tools_aiida.GetConcaveHullCalculation(*args: Any, **kwargs: Any)

Bases: CalcJob

Calculation to get the concave hull of the latent space of a set of descriptors for a structure database.

classmethod define(spec)
prepare_for_submission(folder)

Write the input files that are required for the code to run.

Parameters:

folder – a Folder to temporarily write files on disk

Returns:

CalcInfo instance

class atlas.active_learning.extrapolation.autoencoder_tools_aiida.GetConcaveHullCalculationParser(*args: Any, **kwargs: Any)

Bases: Parser

Parser for the retrieved files from a MACE descriptors job.

parse(**kwargs)

Parse the retrieved files of the calculation job.

class atlas.active_learning.extrapolation.autoencoder_tools_aiida.GetLatentSpaceAutoencoderCalculation(*args: Any, **kwargs: Any)

Bases: CalcJob

Calculation to train an autoencoder and use it to get the latent space for the descriptors of a structure database.

classmethod define(spec)
prepare_for_submission(folder)

Write the input files that are required for the code to run.

Parameters:

folder – a Folder to temporarily write files on disk

Returns:

CalcInfo instance

class atlas.active_learning.extrapolation.autoencoder_tools_aiida.GetLatentSpaceAutoencoderCalculationParser(*args: Any, **kwargs: Any)

Bases: Parser

Parser for the retrieved files from a MACE descriptors job.

parse(**kwargs)

Parse the retrieved files of the calculation job.

class atlas.active_learning.extrapolation.autoencoder_tools_aiida.TrainAutoencoderCalculation(*args: Any, **kwargs: Any)

Bases: CalcJob

Implementation of a CalcJob to perform an Autoencoder training using a settings dir.

Inputs

settings_dictorm.Dict

Dictionary containing training settings.

descriptors_file_pathorm.Str

Path to the descriptors to evaluate in npy format.

Outputs

model_fileorm.SinglefileData

Path of the trained Autoencoder model.

training_logorm.SinglefileData

Log file containing the training process.

Exit Codes

420ERROR_INVALID_OUTPUT

Autoencoder Training calculation could not run.

classmethod define(spec)

Define the input and output specifications for the CalcJob.

prepare_for_submission(folder)

Write the input files that are required for the code to run.

Parameters:

folder – an Folder to temporarily write files on disk

Returns:

CalcInfo instance

class atlas.active_learning.extrapolation.autoencoder_tools_aiida.TrainAutoencoderCalculationParser(*args: Any, **kwargs: Any)

Bases: Parser

Parser for the retrieved files from an Autoencoder training calculation job.

parse(**kwargs)

Parse the retrieved files of the calculation job.

atlas.active_learning.extrapolation.autoencoder_tools_aiida.prepare_cli_args_autoencoder(params_list: list, settings_dict: dict)

Prepare the command line arguments for the Autoencoder training.

atlas.active_learning.extrapolation.concave_hull module

Utilities to compute the concave hull for a point cloud of atomic descriptors.

atlas.active_learning.extrapolation.concave_hull.alpha_shape(points, alpha: float, only_outer: bool = True)

Compute the 2-D alpha-shape (concave hull) of a set of points.

Parameters:
  • points ((N, 2) array-like) – Input coordinates.

  • alpha (float) – Inverse length scale. Smaller alpha -> coarser (more concave) hull. A good starting point is alpha ~ 1 / (average edge length).

  • only_outer (bool, default True) – If True, return only the outer boundary. If False, keep holes.

Return type:

shapely.geometry.Polygon | MultiPolygon

atlas.active_learning.extrapolation.concave_hull.check_atom_in_domain(concave_hull: ndarray, descriptors: ndarray) tuple[ndarray, ndarray, ndarray]

Check if the generated descriptors are inside the precomputed concave hull.

Parameters:
  • concave_hull (np.ndarray) – Concave hull of the latent space for the database, corresponding to its convex or concave hull.

  • descriptors (np.ndarray) – The descriptors to check, typically of shape (N, 2).

Returns:

  • np.ndarray – Array containing the descriptors that are inside the concave hull.

  • np.ndarray – Array containing the descriptors that are outside the concave hull.

  • np.ndarray – Array containing boolean values showing if the frame is inside the concave hull.

atlas.active_learning.extrapolation.concave_hull.check_traj_in_domain(concave_hull: ndarray, descriptor_dict: dict, hull_scale_factor: float = 0.0) tuple[ndarray, ndarray, ndarray, list | ndarray | None]

Check if the generated descriptors are inside the precomputed concave hull.

Parameters:
  • concave_hull (np.ndarray) – Concave hull of the latent space for the database, corresponding to its convex or concave hull.

  • descriptor_dict (dict) –

    Descriptor dictionary containing the descriptors for each frame. The structure is as follows: ```python {

    uuid: {

    ‘latent_space’: np.ndarray, ‘descriptors’: np.ndarray, ‘is_extrapolating’: np.ndarray,

    }

  • hull_scale_factor (float, optional) – Tolerance percentage to enlarge the concave hull. For example, 0.1 adds 10% tolerance. Default is 0.0.

Returns:

  • np.ndarray – Array containing the descriptors that are inside the concave hull.

  • np.ndarray – Array containing the descriptors that are outside the concave hull.

  • np.ndarray – Array containing boolean values showing if the frame is inside the concave

atlas.active_learning.extrapolation.concave_hull.get_concave_hull_python(latent_space: ~numpy.ndarray, target_alpha_range: tuple[float, float] = (3.0, 8.0), default_alpha_if_issues: float = 5, nn_dist_scale_factor: float = 1.5, frac_points_allowed_out: float = 0.002, n_attempts: int = 20, decrease_factor_multiplier: float = 0.95, use_alpha: float | None = None) -> (<class 'numpy.ndarray'>, <class 'float'>)

Compute the concave hull of a set of points using the alpha-shape algorithm.

Parameters:
  • latent_space (np.ndarray) – The input points (N, 2) for which to compute the concave hull.

  • target_alpha_range (tuple[float, float], optional) – The desired (min_alpha, max_alpha) range for the alpha parameter. Alpha will be clipped to this range. Defaults to (3.0, 8.0).

  • default_alpha_if_issues (float, optional) – Default alpha value to use if nearest neighbor distance calculation is not possible (e.g., too few points) or other issues arise. Defaults to 5 (midpoint of common 3-8 range).

  • nn_dist_scale_factor (float, optional) – Scaling factor for the alpha candidate calculation: alpha_candidate = nn_dist_scale_factor / mean_nn_dist. Defaults to 1.5.

  • frac_points_allowed_out (float, optional) – The maximum fraction of points allowed to be outside the concave hull. If the fraction of points outside the hull exceeds this value, alpha will be decreased iteratively until the condition is met or alpha reaches zero. Defaults to 0.002 (0.2%).

  • n_attempts (int, optional) – Number of attempts to compute the concave hull by adjusting alpha.

  • use_alpha (float, optional) – If provided, this alpha value will be used directly to compute the concave hull without optimization. Defaults to None.

Returns:

  • np.ndarray – The coordinates of the vertices of the concave hull (N, 2). Returns an empty array np.empty((0,2)) if a hull cannot be formed.

  • float – The alpha value used to compute the concave hull.

atlas.active_learning.extrapolation.concave_hull.get_optimized_concave_hull(latent_space: ndarray, target_alpha_range: tuple[float, float] = (0.0, 10.0), frac_points_allowed_out: float = 0.02) tuple[ndarray, float]
atlas.active_learning.extrapolation.concave_hull.plot_concave_hull(concave_hull: ndarray, latent_space: ndarray = None, point_inside: ndarray = None, point_outside: ndarray = None, scaled_hull: ndarray | list[ndarray] = None, filename: str = None, alpha: float = None, plot_density: bool = False, title: str = 'Concave Hull')

Generate a plot for the concave hull area in 2D space, including in and out of domain points if provided.

Parameters:
  • concave_hull (np.ndarray) – The coordinates of the concave hull vertices (N, 2).

  • latent_space (np.ndarray, optional) – The full set of points in the latent space (N, 2).

  • point_inside (np.ndarray, optional) – Points inside the concave hull (M, 2).

  • point_outside (np.ndarray, optional) – Points outside the concave hull (K, 2).

  • scaled_hull (np.ndarray | list[np.ndarray], optional) – The coordinates of the scaled concave hull vertices (N, 2).

  • filename (str, optional) – The filename to save the plot. Defaults to ‘concave_hull.png’.

  • alpha (float, optional) – If provided, a single alpha value will be used used to compute the concave hull, instead of attempting to optimize it.

  • plot_density (bool, optional) – Whether to color points by density. Defaults to False.

  • title (str, optional) – The title of the plot. Defaults to ‘Concave Hull’.

atlas.active_learning.extrapolation.morphological_closing module

Utilities for boundary determination using morphological closing.

atlas.active_learning.extrapolation.morphological_closing.create_image_mask(data_X, data_Y, disk_size=10, figsize=(10, 8), dpi=100, threshold=250, point_size=5)

Creates mask for 2D points by applying morphological closing to plot.

atlas.active_learning.extrapolation.morphological_closing.extract_boundaries_from_mask(solid_mask, fig_seg, ax_seg, img_w, img_h, contour_level=0.5)

Extracts data-space boundaries from the pixel mask.

atlas.active_learning.extrapolation.morphological_closing.filter_points_by_mask(data_X, data_Y, solid_mask, ax_seg, img_w, img_h)

Filters points based on whether they fall inside the generated mask.

atlas.active_learning.extrapolation.morphological_closing.process_morphological_closing(data_X, data_Y, disk_size=1, figsize=(10, 8), dpi=100, threshold=250, point_size=5) dict

Extract shape and filter points using morphological closing.

Returns a dictionary containing the filtered points, selected indices, boundaries, and the generated mask.

atlas.active_learning.extrapolation.quadtree module

Utility functions for QuadTree operations.

class atlas.active_learning.extrapolation.quadtree.QuadTree(boundary: Rectangle, capacity: int = 4, initial_capacity_fraction: float | None = None, initial_data_amount: int = 0, data_range_x: float | None = None, data_range_y: float | None = None)

Bases: object

A QuadTree for spatial partitioning of 2D points.

The QuadTree subdivides space into quadrants to efficiently manage and query points. The reasoning behind its use is to identify dense regions in a 2D space by recursively subdividing areas until a certain density criterion is met, which allows to select regions for alpha-shape computation in multi-resolution datasets.

find_dense_leaves(max_width_threshold: float) list[Rectangle]

Traverse the tree and return the boundaries of leaf nodes that are smaller than the threshold (indicating high density).

Parameters:

max_width_threshold (float) – The width (2 * w) below which a node is considered ‘dense’.

Returns:

A list of bounding boxes representing dense regions.

Return type:

list[Rectangle]

find_dense_leaves_density(min_density_threshold: float, total_area: float) list[Rectangle]

Traverse the tree and return leaf nodes that meet a density requirement. Density = (number of points) / (area of node).

insert(point: Point) bool
subdivide() None
class atlas.active_learning.extrapolation.quadtree.Rectangle(x: float, y: float, w: float, h: float)

Bases: object

Object representing a rectangle for a QuadTree.

Axis-aligned rectangle represented by its center (x, y) and half-width (w) and half-height (h).

contains(point) bool
get_area() float
h: float
w: float
x: float
property xmax: float
property xmin: float
y: float
property ymax: float
property ymin: float
atlas.active_learning.extrapolation.quadtree.check_if_points_in_polygons(alpha_shapes: list, data: list[Point])
atlas.active_learning.extrapolation.quadtree.separate_clusters(dense_boxes: list[Rectangle]) list[list[Rectangle]]

Groups dense boxes into connected clusters. Boxes are considered connected if they touch or overlap.

atlas.active_learning.extrapolation.quadtree.setup_quadtree(all_points, offset_frac: float = 0.1, data_frac_capacity: float = 0.015)
atlas.active_learning.extrapolation.quadtree.visualize_quadtree(qt: QuadTree, points: list[Point], clusters: list[list[Rectangle]], alpha_shapes: list[dict] | None = None, filename: str | Path = 'quadtree_viz.png', frac_outside: float = 0.0, show: bool = False)

Visualizes the quadtree structure, data points, and clusters.

atlas.active_learning.extrapolation.train_autoencoder module

Train an autoencoder for dimensionality reduction.

atlas.active_learning.extrapolation.train_autoencoder.return_dataset_loader(dataset: str | Path | ndarray, train_frac: float, valid_frac: float, dtype: str, batch_size: int, rng_seed: int)
atlas.active_learning.extrapolation.train_autoencoder.run_training(args)

Train an autoencoder model for dimensionality reduction.

Parameters:

args (Namespace) –

Namespace object containing the following attributes: - dataset : str

Path to the dataset file.

  • devicestr

    Device to run the model on. If not given, the device is set to cuda if available, else cpu.

  • dtypestr

    Data type to use for the model. One of float32 or float64. Default is float32.

  • lrfloat

    Learning rate for the optimizer.

  • num_epochsint

    Number of epochs to train the model.

  • batch_sizeint

    Batch size for training.

  • patienceint

    Number of epochs to wait before reducing the learning rate.

  • train_fracfloat

    Fraction of the dataset to use for training.

  • valid_fracfloat

    Fraction of the dataset to use for validation.

  • test_fracfloat

    Fraction of the dataset to use for testing.

  • l1_hidden_dimint

    Number of hidden units in the first layer of the autoencoder.

  • l2_hidden_dimint

    Number of hidden units in the second layer of the autoencoder.

  • weight_decayfloat

    Weight decay for the optimizer.

  • bias_flagbool

    Flag to include bias in the linear layers.

  • verbosebool

    Flag to print verbose output.

  • rng_seedint

    Random seed for reproducibility.

  • model_pathstr

    Path to save the trained model.

  • wandbbool

    Flag to enable logging to wandb.

  • wandb_projectstr

    Name of the wandb project.

  • wandb_namestr

    Name of the wandb run.

  • standardize_databool

    Whether to normalize the data before training the autoencoder

Returns:

Autoencoder model trained for dimensionality reduction.

Return type:

Autoencoder

atlas.active_learning.extrapolation.train_autoencoder.safe_to_gpu(arrays, target_dtype, device)

Checks if there is enough VRAM to move an array to the GPU. Returns True if safe, False if it will likely OOM.

atlas.active_learning.extrapolation.train_autoencoder.split_dataset(data: str | Path | ndarray, train_frac: float, valid_frac: float, test_frac: float, rng_seed: int = None, device: str = None, dtype=torch.float32)
atlas.active_learning.extrapolation.train_autoencoder.train_loop(data_loader, model, loss_fn, optimizer, device)
atlas.active_learning.extrapolation.train_autoencoder.val_loop(data_loader, model, loss_fn, device)

Module contents

Extrapolation utilities.