Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplementation of Ad projection operators #1182

Open
keileg opened this issue Jun 7, 2024 · 12 comments
Open

Reimplementation of Ad projection operators #1182

keileg opened this issue Jun 7, 2024 · 12 comments
Assignees

Comments

@keileg
Copy link
Contributor

keileg commented Jun 7, 2024

Background

In the Ad operator framework, the classes SubdomainProjection, MortarProjection, Trace and InvTrace all represent mappings betweeen grid entities of various kinds. These are all implemented as projection matrices that, during parsing of an operator tree, are left multiplied with arrays and matrices. This is unsatisfactory for two reasons:

  1. Constructing the projection matrices is slow and adds significantly to the computational cost of working with the Ad operators. As a partial remedy, some if not all of the projection classes construct the projection matrices during initializationof the projection object, but this to a large degree just moves the computational burden. The ahead-of-time construction will also be hard to make compatible with the caching outlined in Implement an Ad parser class #1181.
  2. Projection operations are in reality slicing (for a global-to-local restriction) or filling in of selected rows in large arrays and matrices (for local-to-global prolongation). Representing this as matrix-matrix / matrix-vector products likely adds significantly to the computational cost.

Suggested change

The projection methods, say SubdomainProjection.face_restriction(), now return pp.ad.SparseArrayss. Instead, it should return a new slicer-object that roughly looks like this:

class RestrictionSlicer:

    self._target_indices: np.ndarray
    # EK note to self: In the future, this may also be a PETSc IS object

    self._weights
    # Weights to be assigned to the different rows

    def __init__(self, int: dim, target_indices: np.ndarray, weights: np.ndarray) -> None:
        self._target_indices = target_indices
        self._weights = weights

    # We may need to use typing overload here to make sure mypy is happy: The return type
    # should always be the same as the input other.
    def __matmul__(self, other: pp.ad.AdArray | np.ndarray) -> pp.ad.AdArray | np.ndarray:
        # In its simplest form (if all of self._weights is one), this can be implemented as
        return other[self._target_indices]
        # However, for non-unitary weights, we need special treatment. For np.ndarrays, this is something like
        return other[self._target_indices] * self._weights
        # AdArrays are a bit more complicated: In principle we can make a diagonal matrix of the weights and 
        # right multiply, so
        scaling = sps.dia_array((self._weights, 0), shape=(self._weights.size, self._weights.size)))
        return scaling @ other[self._target_indices]
        # This risks being slow, though (profiling is needed). The alternative is to work directly with the sparse data structure
        # in the AdArray, using pp.matrix_operations.rldecode to expand the weight array to the right size
        # (can be different from one row to the next). A rough outline of the code is:
        tmp_mat = other[self._target_indices]
        data_ind = pp.matrix_operations.rldecode(np.arange(tmp_mat.jac.shape[0]), np.diff(tmp_mat.indptr))
        tmp_mat.jac.data *= self._weights[data_ind]
        return tmp_mat
        # A final question is whether slicing (all calls above to other[self._target_indices]) is
        # sufficiently efficient, or if we should rather construct a new array by exploiting the CSR format
        # in scipy.arrays. Again, profiling will be needed.

    # NB: The other dunder methods (__add__ etc.) should possibly raise an error.

class ProjectionSlicer:

    self._target_indices: np.ndarray
    self._target_num_rows: int

    self._weights: np.ndarray

    def __init__(self, int: dim, mdg: MixedDimensionalGrid, domain: list[pp.Grid]) -> None:
        # self._target_indices and self._target_num_rows are computed here

    def __matmul__(self, other: pp.ad.AdArray | np.ndarray) -> pp.ad.AdArray | np.ndarray:
        if isinstance(other, np.ndarray):
            result = np.ndarray(self._target_size, dtype=other.dtype)
            result[self._target_ind] = other
        else:  # pp.ad.AdArray
            res_val = np.ndarray([self._target_size, dtype=other.val.dtype)
            res_jac = sps.csr_matrix((self._target_size, other.jac.shape[1]))
            res_val[self._target_ind] = other.val
            res_jac[self._target_ind] = other.jac
            return pp.ad.AdArray(res_val, res_jac)

Comments:

  1. It is not clear whether we need one slicer each for SubdomainProjection, MortarProjection etc.
  2. Similar thoughts may apply to the geometry basis functions pp.models.geometry.basis, research is needed.
  3. The implementation should change the Ad backend only, no changes in the computational grid.
  4. Testing will be needed.

Regarding the divergence class

The class pp.ad.Divergence represents a differential operator rather than a mapping. This cannot readily be implemented as slicing, and although improvements should be possible there as well, it will not be part of this issue.

Dependencies

This can be done independently of #1179, #1181, but should not be done in parallel.

@keileg
Copy link
Contributor Author

keileg commented Jun 7, 2024

Some of the functionality in po.models.geometry may also be suited for this approach.

@IvarStefansson
Copy link
Contributor

Some of the functionality in po.models.geometry may also be suited for this approach.

That's an interesting comment. Is there any chance this approach can help us with representation of tensors and their products?

@keileg
Copy link
Contributor Author

keileg commented Jun 14, 2024

Perhaps, it depends on what you mean. I suggest we discuss in person.

@keileg
Copy link
Contributor Author

keileg commented Nov 16, 2024

Implementing this will require three main steps, each split into separate tasks.

Preparatory steps

Implementation of the new projection operators

  • Implement a first version of the RestrictionSlicer and ProjectionSlicer, outlined above. There are several suggestions for how to implement __matmul__, we probably should implement all, and let benchmarking (see below) decide which to use.
  • Make some simple tests to verify the classes. Copilot should be able to handle this.
  • Prepare benchmarking for the slicing operators, stand-alone (used on np.ndarrays and pp.ad.AdArrays. Hopefully, copilot can help with this. It is important to consider both small and big arrays, and slicing/prolongation to few and many of the rows. We should also test the existing implementation of multiplying with a sparse matrix.
  • Try to use the benchmarking to decide which of the implementations of __matmul__ should be used. It could be that the results are similar for some cases, in which case we either prioritize simplicity, or keep several options for a while.

Inclusion of the new slicers in the Ad operator system

  • Introduce the new slicer classes in the SubdomainProjection, to replace the existing projections. It could be that we need to introduce a new Ad operator to ensure delayed evaluation (the existing projections use SparseArrays), but this should be a relatively simple task.
  • Introduce the new slicer classes in the Ad parsing. My feeling is we need few if any changes to the parsing in pp.ad.Operator, but we may need to do modifications to pp.ad.AdArray to let these know how to deal with the new classes (simply leave the operation to the new class by calling its __matmul__)
  • Expand the test suite for Ad parsing to cover the new objects.
  • Modify the existing test of SubdomainProjections. Currently this is based on the projection matrix being explicitly available, but when we only have the action of the slicer, we need to construct the matrix (multiply with one unit vector at a time, build the matrix this way).
  • Benchmarking of the new results, using the full Ad framework, again with small and large arrays, small and large slices. Copilot should be able to help. Compare with the existing implementation of the SubdomainProjection.
  • Introduce the new functionality also to the MortarProjection. Modify all projections, but leave out the sign_of_mortar_side for now.

@keileg
Copy link
Contributor Author

keileg commented Nov 17, 2024

Benchmarking (to be updated)

Observations from benchmarking using line_profiler (which btw seems to have a substantial overhead with my setup, but comparing relative performance, rather than absolute numbers, should give some indications)

Running flow_benchmark_2d_case3, with a grid of 24K cells in the matrix, ~800 mortar cells

  1. Substantial time is spent in the mortar projections. Less so in the subdomain projections (but this is reasonable, the physics is homogeneous between the subdomains).
  2. The time spent on defining the equations (which will include the projections) is roughly equal to that of mpfa discretization. There are indications that the mortar projections take the main part of this construction, though this is not certain, given the noise in profiling.
  3. Time for assembly is (for this case) considerable less than defining equations and discretizing.
  4. Time for linear solve is also not that big.

Conclusion so far: It seems the projection update project can make an impact.

@IvarStefansson
Copy link
Contributor

IvarStefansson commented Nov 18, 2024

This may be a stupid question: How will the slicing work for non-matching mortar grids?

@keileg
Copy link
Contributor Author

keileg commented Nov 18, 2024

This may be a stupid question: How will the slicing work for non-matching mortar grids?

It will be more complicated, involving not only slicing, but weighted sums over rows. I haven't worked out the details, but if it gets too complicated, it could be that we end up using the old solution for non-matching grids, at least until we start using these in earnest.

@keileg
Copy link
Contributor Author

keileg commented Nov 19, 2024

Benchmarking of the actual slicing and reconstruction

There turned out to be two natural ways to implement these operations:

  1. Operating directly on the sparse data structures as outlined in the initial posting of this issue. Referred to as 'Option 1' below'.
  2. Explicitly constructing a restriction/prolongation matrix. Referred to as 'Option 2' below.

Comments:

  • Option 2 is similar to what is being done now in pp.ad.SubdomainProjections, however, if we go for this solution, we will not rely on sps.bmat to construct a combined matrix for multiple subdomains, but rather concatenate indices and construct a single matrix (could be substantially faster).
  • Option 1 is rather straightforward to implement for a conforming grid, but will become highly technical for non-conforming grids, where the matrix elements of the corresponding projection matrix are non-unitary. In practice, we will end up trying to implement a general matrix-matrix (or matrix-vector) product, and we can hardly expect to improve on scipy there. The reason we may improve things for conforming grids is that in this case the matrix product is in effect slicing, and this insight may be exploited.
  • If we ever introduce a PETSc backend for the Ad operations, Option 2 may benefit from this (assuming PETSc speeds up calculations). Option 1 will require that we figure out how to efficiently manipulate the PETSc sparse matrix format.
  • Based on inspection of the source code, both options are expected to some improvements on the current implementation of SubdomainProjection and significant improvements to MortarProjection (which is horribly inefficient).

The benchmarks:

Two md geometries were considered:

  1. The regular geometry (case 1) from the 2d flow benchmark, with 400, 24K and 2400K cells in total.
  2. Case 3 from the 3d flow benchmarks, with 30K, 150K and 350K cells in total.

We test both restriction to and prolongation from each subdomain. This is representative for both subdomain and mortar projections, with the assumption that we treat the task of identifying indices to project to and from different from the task of mapping. By similar reasoning, mapping to sets of domains is also covered.

We comment below on time for initialization, and of mapping of vectors, sparse matrices and AdArrays. In Option 2, initialization entails construction of a projection matrix, while Option 1 only stores passed information during initialization. Thus the relative importance of initialization cost depends on how many times a restriction/prolongation object is used; currently this is often only once, but future caching (#1181) may change this.

Results

Summarized as follows:

  1. Generally, prolongation is more costly than restriction, reflecting the need to construct and operate on larger vectors and matrices. The difference increases with cell count.
  2. Initialization is fastest with Option 1, with a cost of roughly 2-20% of that for Option 2.
  3. Restricting and prolonging a vector with Option 1 takes 20-100% of the corresponding time for Option 2. For Option 1, the time for vector operations and initialization are of the same order of magnitude (though the scaling of operations is poorer than that of initialization), while for Option 2, initialization can be 1-2 orders of magnitude more costly.
  4. For matrix operations, the scaling is roughly linear for Option 1, while for Option 2 it is in general slightly worse than linear. In total, the cost for Option 1 is about 20-100% of that for Option 2, with the difference increasing with matrix size. Testing on matrices with different number of nonzeros per row indicated that Option 1 can be considerably better than Option 2 in some respect (improved prolongation, no difference in behavior for restriction). The cost of initialization is negligible compared to matrix operations.
  5. For Ad arrays, Option 1 takes 50-100% of the time for Option 2, with better scaling with cell count.

Conclusions

  1. Option 1 is significantly faster than Option 2, in particular for larger grids where the difference matters the most. Option 1 is therefore the preferred method.
  2. The need to stay compatible with non-conforming grids means we also will need Option 2. The projection objects will need to decide which restrictor / prolongator to use based on the provided weights.

Next steps

  1. Implement the new classes for restriction and prolongation as helpers in grid_operators.py.
  2. Refactor the SubdomainProjection to use the new classes. This will entail deciding which indices to use based on input tot the methods cell_prolongation() etc. Note that for the SubdomainProjection, the weights will always be unitary.
  3. Introduce tests of the new classes.
  4. Do benchmarking to compare performance with the existing implementation.
  5. Start thinking of the MortarProjection

@keileg
Copy link
Contributor Author

keileg commented Jan 3, 2025

Update, finally: Projections from mortar to secondary cannot use the slicing trick straight away, since even for conforming grids, this is a mapping from two mortar cells to one a single lower-dimensional cell. In retrospect, this is fairly obvious.

In the conforming case, it could be possible to do some clever tricks with indices that exploit that there is a 2-1 correspondence between mortar and secondary grids (as opposed to some-some for non-conforming grids). However, the benefit of doing this is unclear, while the cost measured in development time and code complexity may be substantial.

For now, the goal of the project is therefore to roll out the slicing projections for subdomains, and for mapping of mortars to primary in the conforming case. Mapping from mortar to secondary will stay in matrix form, as will all mortar mappings for non-conforming grids. However, we will also change the implementation so that the projection matrices are created on demand, rather than at initialization (this should cut a substantial part of the computational cost).

@keileg
Copy link
Contributor Author

keileg commented Jan 8, 2025

When rolling out the new operator to the full set of constitutive laws, the following issue popped up: In contrast to the other Ad operators, the slicer operator cannot act as the right operand in an expression A x B (where x is an arithmetic operator). This means that when the restriction operator is the middle operand in an expression A x B x C, Python's order of priority among operators of equal precedence (Python will evaluate from the left) will give require evaluation of A x B first, which is not really meaningful.

Alternatives:

  1. Add parentheses at all places where a slicer may be applied. This is not a good option.
  2. Try to hack the operator parsing system to circumvent the issue. This would be yet another hack in a code of low technical quality.
  3. Drop the implementation of the slicer, roll out the improved version of the mortar projections, with construction of projection matrices when projections are made, not when the projection object is created. For small and medium sized problems, this should represent the majority of the potential computational savings of the whole project.
  4. Put the project on hold pending implementation of Implement an Ad parser class #1181. This will allow us to control the order of parsing without resorting the option no 2 above.
  5. Combine 3. and 4. This should be fully feasible, though require some technical work with git.

If option 5 still seems like a good solution in a few days, I will go with that.

@keileg
Copy link
Contributor Author

keileg commented Jan 9, 2025

Decision: Go with 5.

@keileg
Copy link
Contributor Author

keileg commented Jan 13, 2025

This issue was partly solved by #1301. Move to backlog, awaiting #1181.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants