Skip to content

Commit 8b931f6

Browse files
committed
Minor text edits, including typo corrections, to the technical articles for clarity
1 parent c0f0f28 commit 8b931f6

File tree

2 files changed

+23
-30
lines changed

2 files changed

+23
-30
lines changed

documentation/AI-Verification-Convexity.md

+21-28
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ activation functions evolving the state, such as $tanh$  activation layers
147147
(see layer “nca\_0” in Fig 4). As with FICNNs, the weights in certain
148148
parts of the network are constrained to be non-negative to maintain the
149149
partial convexity property. In the figure above, the weight matrices for
150-
the fully connected layer “fc\_z\_+\_1” is constrained to be positive
150+
the fully connected layer “fc\_z\_+\_1” are constrained to be positive
151151
(as indicated by the “\_+\_” in the layer name). All other fully
152152
connected weight matrices in Fig 4 are unconstrained, giving freedom to
153153
fit any purely feedforward network – see proposition 2 [1]. Note again that in our implementation, the final activation function, $g_k$, is not applied. This still guarantees partial convexity but removes the restriction that outputs of the network must be non-negative.
@@ -180,7 +180,7 @@ $f((1−\lambda)x+\lambda y) \leq (1−\lambda)f(x)+ \lambda f(y)$. Interval
180180

181181
$$ f(x) \leq max(f(a),f(b)) $$
182182

183-
To find the minimum of $f$ on the interval, you could use a optimization routine, such as projected gradient descent, interior-point
183+
To find the minimum of $f$ on the interval, you could use an optimization routine, such as projected gradient descent, interior-point
184184
methods, or barrier methods. However, you can use the properties of
185185
convex functions to accelerate the search in certain scenarios.
186186

@@ -190,7 +190,7 @@ If $f(a) \gt f(b)$, then either the minimum is at $x=b$ or
190190
the minimum lies strictly in the interior of the interval,
191191
$x \in (a,b)$. To assess whether the minimum is at $x=b$, look at the derivative, $\nabla f(x)$, at the interval bounds. If $f$ is not differentiable
192192
at the interval bounds, for example the network has relu activation
193-
functions that defines a set of non-differentiable points in $\mathbb{R}$, evaluate
193+
functions that define a set of non-differentiable points in $\mathbb{R}$, evaluate
194194
both the left and right derivate of $f$ at the interval bounds instead.
195195
Then examine the sign of the directional derivatives at the interval bounds,
196196
directed to the interior of the interval: $sgn( \nabla f(a), -\nabla f(b) ) = (\pm , \pm)$. Note that the sign of 0 is taken as positive in this discussion.
@@ -240,7 +240,7 @@ possible sign combinations since, at $x=b$, convexity means that $-\nabla f(b+\e
240240

241241
In the case that $f(a) = f(b)$, the function must either be
242242
constant and the minimum is $f(a) = f(b)$. Or the minimum again
243-
lies at the interior. If $sgn(\nabla f(a)) = +$, then $\nabla f(a) = 0$ else this violates convexity since $f(a) = f(b)$. Similar is true for
243+
lies in the interior. If $sgn(\nabla f(a)) = +$, then $\nabla f(a) = 0$ else this violates convexity since $f(a) = f(b)$. Similar is true for
244244
$-sgn(\nabla f(b)) = +$. In this case, all sign combinations are possible
245245
owing to possible non-differentiability of $f$ at the interval bounds:
246246

@@ -262,8 +262,8 @@ convex functions.
262262

263263
This idea can be extended to many intervals. Take a 1-dimensional ICNN. Consider subdividing the
264264
operational design domain into a union of intervals $I_i$, where $I_i = [a_i,a_{i+1}]$ and $a_i \lt a_{i+1}$. A tight lower and upper bound on each interval can be computed with a
265-
single forward pass through the network of all interval bounds values in the union of intervals, a
266-
single backward pass through the network to compute derivatives at the interval bounds values, and
265+
single forward pass through the network of all interval boundary values in the union of intervals, a
266+
single backward pass through the network to compute derivatives at the interval boundary values, and
267267
one final convex optimization on the interval containing the global
268268
minimum. Furthermore, since bounds are computed at forward and
269269
backward passes through the network, you can compute a 'boundedness metric' during
@@ -279,29 +279,28 @@ and $sgn(0) = +$.
279279
The previous discussion focused on 1-dimensional convex functions, however, this idea extends to n-dimensional convex functions, $f:\mathbb{R}^n \rightarrow \mathbb{R}$. Note that a vector valued convex function is
280280
convex in each output, so it is sufficient to keep the target as $\mathbb{R}$. In the discussion in this section, take the convex set to be the n-dimensinal hypercube, $H_n$, with vertices, $V_n = {(\pm 1,\pm 1, \dots,\pm 1)}$. General convex hulls will be discussed later.
281281

282-
An important property of convex functions in n-dimensions is that every 1-dimension restriction also defines a convex function. This is easily seen from the
282+
An important property of convex functions in n-dimensions is that every 1-dimensional restriction also defines a convex function. This is easily seen from the
283283
definition. Define $g:\mathbb{R} \rightarrow \mathbb{R}$ as $g(t) = f(t\hat{n}) \text{ where } \hat{n}$ is
284284
some unit vector in $\mathbb{R}^n$. Then, by definition of convexity of $f$, letting $x = t\hat{n}$ and $y = t'\hat{n}$, it follows that,
285285

286286
$$ g((1−\lambda)t+\lambda t') \leq (1−\lambda)g(t)+ \lambda g(t') $$
287287

288-
Note that the restriction to 1-dimensional convex function will be used several times in the following discussion.
288+
Note that the restriction to 1-dimensional convex functions will be used several times in the following discussion.
289289

290290
To determine an upper bound of $f$ on the hypercube, note that any point in $H_n$ can be expressed as a convex combination of its vertices, i.e., for $z \in H_n$, it follows that $z = \sum_i \lambda_i v_i$ where $\sum_i \lambda_i = 1$ and $v_i \in V_n$. Therefore, using the definition of convexity in the first inequality and that $\lambda_i \leq 1$ in the second equality,
291291

292292
$$ f(z) = f(\sum_i \lambda_i v_i) \leq \sum \lambda_i f(v_i) \leq \underset{v \in V_n}{\text{max }} f(v) $$
293293

294-
Consider now the lower bound of $f$ over the hypercube. Here we take the
295-
approach of looking for cases where there is a guarantee that the
296-
minimum lies at a vertex of the hypercube and when this guarantee cannot
297-
be met, falling back to solving the convex optimization over this
298-
hypercubic domain. For the n-dimensional approach, we will split the
294+
Consider now the lower bound of $f$ over a hypercubic grid. Here we take the
295+
approach of looking for hypercubes where there is a guarantee that the
296+
minimum lies at a vertex of the hypercube and when this guarantee is not met, fall back to solving the convex optimization over that particular
297+
hypercubic. For the n-dimensional approach, we will split the
299298
discussion into differentiable and non-differentiable $f$, and consider
300299
these separately.
301300

302301
**Multi-Dimensional Differentiable Convex Functions**
303302

304-
Consider the derivatives evaluated at each vertex of the hypercube. For each $\nabla f(v)$, $v \in V_n$, take the directional derivatives,
303+
Consider the derivatives evaluated at each vertex of a hypercube. For each $\nabla f(v)$, $v \in V_n$, take the directional derivatives,
305304
pointing inward along a hypercubic edge. Without loss of generality,
306305
recall $V_n = \{(±1,±1,…,±1) \in \mathbb{R}^n\}$ and therefore
307306
the hypercube is aligned along the standard basis vectors
@@ -340,10 +339,8 @@ derivative along the line at $w$, pointing inwards, is given by,
340339

341340
$$ \hat{n} \cdot \nabla f(w) = \sum_i -|n_i|\cdot sgn(w_i) \cdot \nabla_i f(w) = \sum_i |n_i| \cdot (-sgn(w_i) \cdot \nabla_i f(w)) \geq 0 $$
342341

343-
is positive, as $\hat{n} = - |n_i| \cdot sgn(w_i) \cdot e_i $.
344-
The properties proved previously can then by applied to this 1-dimensional restriction, i.e., if the
345-
gradient of $f$ as the interval bounds of an interval is positive, then $f$ has
346-
a minimum value at this interval bounds. Hence, a vertex with inward
342+
and is positive, as $\hat{n} = - |n_i| \cdot sgn(w_i) \cdot e_i $.
343+
The properties proved previously can then be applied to this 1-dimensional restriction. Hence, a vertex with inward
347344
directional derivative signature $(+,+,…,+)$ is a lower bound for $f$ over the hypercube. ◼
348345

349346
If there are multiple vertices sharing this signature, then since every
@@ -354,9 +351,9 @@ at vertices sharing these signatures so it is sufficient to select any.
354351

355352
If no vertex has signature $(+,+,…,+)$, solve for the minimum using
356353
a convex optimization routine over this hypercube. Since all local minima are
357-
global minima, there is at least one hypercube requiring this solution.
354+
global minima, there is at least one hypercube requiring this approach.
358355
If the function has a flat section at its minima, there may be other
359-
hypercubes in the operational design domain, also without a vertex with all positive signature. Note that empirically,
356+
hypercubes, also without a vertex with all positive signature. Note that empirically,
360357
this seldom happens for convex neural networks as it requires fine
361358
tuning of the parameters to create such a landscape.
362359

@@ -380,7 +377,7 @@ As depicted in figure 7, the vertices $w$ of the square (hypercube of dimension
380377
bisecting these directional derivatives, into the interior of the square, has a negative gradient. This is
381378
because the vertex is at the intersection of two planes and is a
382379
non-differentiable point, so the derivative through this point is path
383-
dependent. This is a well-known observation but this breaks the assertion that this vertex if the minimum of $f$ over this
380+
dependent. This is a well-known property of non-differentiable functions and breaks the assertion that this vertex is the minimum of $f$ over this
384381
square region. From this example, it is clear the minimum lies at the apex at $(0,0)$.
385382

386383
To ameliorate this issue, in the case that the convex function is
@@ -391,13 +388,9 @@ $relu$ operations. In practice, this means that a vertex may be a
391388
non-differentiable point if the network has pre-activations to $relu$
392389
layers that have exact zeros. In practice, this is seldom the case. The
393390
probability of this occurring can be further reduced by offsetting any
394-
hypercube or hypercubic grid origin by a small random perturbation. It
395-
is assumed during training, for efficiency of computing bounds during training, that the convex neural network is differentiable everywhere. For final post-training analysis, this implementation checks the $relu$
396-
pre-activations for any exact zeros for all vertices. If there are
397-
any zeros in these pre-activations, lower bounds for hypercubes that contain that vertex are recomputed using
398-
an minimization routine. As a demonstration that these bounds are
399-
correct, in the examples, we also run the minimization optimization routine on every
400-
hypercube to show that bounds agree.
391+
hypercube or hypercubic grid origin by a small random perturbation. If there are
392+
any zeros in these pre-activations, lower bounds for hypercubes that contain that vertex can be recomputed using
393+
a convex optimization routine instead.
401394

402395
As a final comment, for general convex hulls, the argument for the upper bound value of the function over the convex hull trivially extends, defined as the largest function value over the set of points defining the hull. The lower bound should be determined using an optimization routine, constrained to the set of point in the convex hull.
403396

documentation/AI-Verification-Monotonicity.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ To circumvent these challenges, an alternative approach is to construct neural n
2727
- **Constrained Weights**: Ensuring that all weights in the network are non-negative can guarantee monotonicity. You can achieve this by using techniques like weight clipping or transforming weights during training.
2828
- **Architectural Considerations**: Designing network architectures that facilitate monotonic behavior. For example, architectures that avoid certain types of skip connections or layer types that could introduce non-monotonic behavior.
2929

30-
The approach taken in this repository is to utilize a combination of these three aspects and is based on the construction outlined in [1]. Ref [1] discusses the derivation in the context of row vector representations of network inputs however MATLAB utilizes a column vector representation of network inputs. This means that the 1-norm discussed in [1] is replaced by the $\infty$-norm for implementations in MATLAB.
30+
The approach taken in this repository is to utilize a combination of activation function, weight and architectural restrictions and is based on the construction outlined in [1]. Ref [1] discusses the derivation in the context of row vector representations of network inputs however MATLAB utilizes a column vector representation of network inputs. This means that the 1-norm discussed in [1] is replaced by the $\infty$-norm for implementations in MATLAB.
3131

3232
Note that for different choices of p-norm, the derivation in [1] still yields a monotonic function $f$, however there may be couplings between the magnitudes of the partial derivatives (shown for p=2 in [1]). By default, the implementation in this repository sets $p=\infty$ for monotonic networks but other values are explored as these may yield better fits.
3333

@@ -50,7 +50,7 @@ The main challenge with expressive monotonic networks is to balance the inherent
5050

5151
For networks constructed to be monotonic, verification becomes more straightforward and comes down to architectural and weight inspection, i.e., provided the network architecture is of a specified monotonic topology, and that the weights in the network are appropriately related - see [1] - then the network is monotonic.
5252

53-
In summary, while verifying monotonicity in general neural networks is complex due to non-linearities and high dimensionality, constructing networks with inherent monotonic properties simplifies verification. By using monotonic activation functions and ensuring non-negative weights, you can design networks that are guaranteed to be monotonic, thus facilitating the verification process and making the network more suitable for applications where monotonic behavior is essential.
53+
In summary, while verifying monotonicity in general neural networks is complex due to non-linearities and high dimensionality, constructing networks with inherent monotonic properties simplifies verification. By using constrained architectures and weights, you can design networks that are guaranteed to be monotonic, thus facilitating the verification process and making the network more suitable for applications where monotonic behavior is essential.
5454

5555
**References**
5656

0 commit comments

Comments
 (0)