write text

jxbz · jxbz · commit 3e6ce4a65076 · 2025-01-21T20:03:05.000-05:00
diff --git a/docs/README.md b/docs/README.md
@@ -8,5 +8,6 @@ To build these docs locally do:
 ```bash
 cd docs
 pip install -r requirements.txt
+conda install -c conda-forge pandoc
 make livedirhtml
 ```
diff --git a/examples/hello-world.ipynb b/examples/hello-world.ipynb
@@ -8,29 +8,44 @@
     "# Hello, World!"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "847730fa-390b-4b0a-8600-55fb76f9cc38",
+   "metadata": {},
+   "source": [
+    "On this page, we will build a simple training loop to fit an MLP to some randomly generated data. We start by sampling some data. Modula uses JAX to handle array computations, so we use JAX to sample the data. JAX requires us to explicitly pass in the state of the random number generator."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 1,
    "id": "5a7a804b-06ec-4773-864c-db8a3b01c3e1",
    "metadata": {},
    "outputs": [],
    "source": [
     "import jax\n",
     "import jax.numpy as jnp\n",
     "\n",
-    "input_dim = 28 * 28\n",
+    "input_dim = 784\n",
     "output_dim = 10\n",
     "batch_size = 128\n",
     "\n",
-    "# Generate random training data\n",
     "key = jax.random.PRNGKey(0)\n",
     "inputs = jax.random.normal(key, (input_dim, batch_size))\n",
     "targets = jax.random.normal(key, (output_dim, batch_size))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "3809ea7f-cd49-4b2f-98a9-0bcd420fbcac",
+   "metadata": {},
+   "source": [
+    "Next, we will build our neural network. We import the basic Linear and ReLU modules. And we compose them by using the `@` operator. Calling `mlp.jit()` tries to make all the internal module methods more efficient using [just-in-time compilation](https://jax.readthedocs.io/en/latest/jit-compilation.html) from JAX."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 2,
    "id": "a7a14a1b-1428-4432-8e89-6b7cfed3d765",
    "metadata": {},
    "outputs": [
@@ -53,43 +68,70 @@
     "width = 256\n",
     "\n",
     "mlp = Linear(output_dim, width)\n",
-    "mlp @= ReLU() @ Linear(width, width) \n",
-    "mlp @= ReLU() @ Linear(width, input_dim)\n",
+    "mlp @= ReLU() \n",
+    "mlp @= Linear(width, width) \n",
+    "mlp @= ReLU() \n",
+    "mlp @= Linear(width, input_dim)\n",
     "\n",
     "print(mlp)\n",
     "\n",
     "mlp.jit()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "a7af8a3d-77dc-4007-a6de-03ab617bf3fa",
+   "metadata": {},
+   "source": [
+    "Next, we choose our error measure. Error measures allow us to both compute the loss of the model, and also to compute the derivative of the loss with respect to model outputs. For simplicity we will just use squared error."
+   ]
+  },
   {
    "cell_type": "code",
-   "execution_count": 12,
+   "execution_count": 3,
+   "id": "a7ea38a1-2684-437f-88fc-cb1f2a44133c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from modula.error import SquareError\n",
+    "\n",
+    "error = SquareError()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c4b8252-b3f0-4d16-9b48-9d8d582c1abe",
+   "metadata": {},
+   "source": [
+    "Finally we are ready to train our model. The method `mlp.backward` takes as input the weights, activations and the gradient of the error. It returns the gradient of the loss with respect to both the model weights and the inputs. The method `mlp.dualize` takes in the gradient of the weights and solves for the vector of unit modular norm that maximizes the linearized improvement in loss."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
    "id": "080bbf4f-0b73-4d6a-a3d5-f64a2875da9c",
    "metadata": {},
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Step 0, Loss 0.9790326952934265\n",
-      "Step 100, Loss 0.0018738203216344118\n",
-      "Step 200, Loss 0.0014391584554687142\n",
-      "Step 300, Loss 0.0010814154520630836\n",
-      "Step 400, Loss 0.0008106177556328475\n",
-      "Step 500, Loss 0.0005738214822486043\n",
-      "Step 600, Loss 0.0003808117180597037\n",
-      "Step 700, Loss 0.00022766715846955776\n",
-      "Step 800, Loss 0.00011454012565081939\n",
-      "Step 900, Loss 3.979807297582738e-05\n"
+      "Step   0 \t Loss 0.976274\n",
+      "Step 100 \t Loss 0.001985\n",
+      "Step 200 \t Loss 0.001541\n",
+      "Step 300 \t Loss 0.001189\n",
+      "Step 400 \t Loss 0.000884\n",
+      "Step 500 \t Loss 0.000625\n",
+      "Step 600 \t Loss 0.000413\n",
+      "Step 700 \t Loss 0.000251\n",
+      "Step 800 \t Loss 0.000130\n",
+      "Step 900 \t Loss 0.000049\n"
      ]
     }
    ],
    "source": [
-    "from modula.error import SquareError\n",
-    "\n",
     "steps = 1000\n",
     "learning_rate = 0.1\n",
-    "error = SquareError()\n",
     "\n",
     "key = jax.random.PRNGKey(0)\n",
     "w = mlp.initialize(key)\n",
@@ -118,7 +160,7 @@
     "    w = [weight - lr * d_weight for weight, d_weight in zip(w, d_w)]\n",
     "\n",
     "    if step % 100 == 0:\n",
-    "        print(f\"Step {step}, Loss {loss}\")\n"
+    "        print(f\"Step {step:3d} \\t Loss {loss:.6f}\")\n"
    ]
   }
  ],
@@ -138,7 +180,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.12.8"
+   "version": "3.10.16"
   }
  },
  "nbformat": 4,