{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# Multimodal data: Image classification as an example\n",
    "\n",
    "We take the MNIST image classification task as an example of loading multimodal data. This tutorial is for those that have read all parts in \"Get Started\" and advanced parts in \"Advanced Usage\" including \"New data derivers\" and \"Customized model base\".\n",
    "\n",
    "Although we support multimodal data, multimodal models are currently not integrated as part of the package (that's why this part is in \"Advanced Usage\"). `pytorch_widedeep` (`WideDeep` in this package) and `autogluon` (`AutoGluon` in this package) support some multimodal models. If you are willing to develop multimodal models or add support to model bases, you are welcome to contribute on GitHub."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true,
    "pycharm": {
     "name": "#%%\n"
    }
   },
   "outputs": [],
   "source": [
    "import tabensemb\n",
    "import torch\n",
    "import os\n",
    "from tempfile import TemporaryDirectory\n",
    "\n",
    "temp_path = TemporaryDirectory()\n",
    "tabensemb.setting[\"default_output_path\"] = os.path.join(temp_path.name, \"output\")\n",
    "tabensemb.setting[\"default_config_path\"] = os.path.join(temp_path.name, \"configs\")\n",
    "tabensemb.setting[\"default_data_path\"] = os.path.join(temp_path.name, \"data\")\n",
    "\n",
    "device = \"cuda\" if torch.cuda.is_available() else \"cpu\""
   ]
  },
  {
   "cell_type": "markdown",
   "source": [
    "The following code is copied from [an official example](https://github.com/pytorch/examples/blob/main/mnist/main.py) of `pytorch` that defines the network and transformation of images and downloads the dataset.\n",
    "\n",
    "**Remark**: Note that the `Net` returns logits instead of the `log_softmax` transformed values in the official example for compatibility with the framework. We have emphasized this in \"Customized model base\"."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\n",
      "Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to /tmp/tmpcdcd59k_/data/MNIST/raw/train-images-idx3-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "text/plain": "  0%|          | 0/9912422 [00:00<?, ?it/s]",
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "3f03bb96d3ce4c20bcc81958927fcf09"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting /tmp/tmpcdcd59k_/data/MNIST/raw/train-images-idx3-ubyte.gz to /tmp/tmpcdcd59k_/data/MNIST/raw\n",
      "\n",
      "Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\n",
      "Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to /tmp/tmpcdcd59k_/data/MNIST/raw/train-labels-idx1-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "text/plain": "  0%|          | 0/28881 [00:00<?, ?it/s]",
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "9956944d9d6f4639890fdaeee93634b7"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting /tmp/tmpcdcd59k_/data/MNIST/raw/train-labels-idx1-ubyte.gz to /tmp/tmpcdcd59k_/data/MNIST/raw\n",
      "\n",
      "Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\n",
      "Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to /tmp/tmpcdcd59k_/data/MNIST/raw/t10k-images-idx3-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "text/plain": "  0%|          | 0/1648877 [00:00<?, ?it/s]",
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "15e42ffc21d449aeb986934d763679c6"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting /tmp/tmpcdcd59k_/data/MNIST/raw/t10k-images-idx3-ubyte.gz to /tmp/tmpcdcd59k_/data/MNIST/raw\n",
      "\n",
      "Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\n",
      "Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to /tmp/tmpcdcd59k_/data/MNIST/raw/t10k-labels-idx1-ubyte.gz\n"
     ]
    },
    {
     "data": {
      "text/plain": "  0%|          | 0/4542 [00:00<?, ?it/s]",
      "application/vnd.jupyter.widget-view+json": {
       "version_major": 2,
       "version_minor": 0,
       "model_id": "d6b8e6d6fe5d43918728c24f6d0f30de"
      }
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Extracting /tmp/tmpcdcd59k_/data/MNIST/raw/t10k-labels-idx1-ubyte.gz to /tmp/tmpcdcd59k_/data/MNIST/raw\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "from torchvision import datasets, transforms\n",
    "\n",
    "class Net(nn.Module):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.conv1 = nn.Conv2d(1, 32, 3, 1)\n",
    "        self.conv2 = nn.Conv2d(32, 64, 3, 1)\n",
    "        self.dropout1 = nn.Dropout(0.25)\n",
    "        self.dropout2 = nn.Dropout(0.5)\n",
    "        self.fc1 = nn.Linear(9216, 128)\n",
    "        self.fc2 = nn.Linear(128, 10)\n",
    "\n",
    "    def forward(self, x):\n",
    "        x = self.conv1(x)\n",
    "        x = F.relu(x)\n",
    "        x = self.conv2(x)\n",
    "        x = F.relu(x)\n",
    "        x = F.max_pool2d(x, 2)\n",
    "        x = self.dropout1(x)\n",
    "        x = torch.flatten(x, 1)\n",
    "        x = self.fc1(x)\n",
    "        x = F.relu(x)\n",
    "        x = self.dropout2(x)\n",
    "        x = self.fc2(x)\n",
    "        return x\n",
    "\n",
    "transform=transforms.Compose([\n",
    "        transforms.ToTensor(),\n",
    "        transforms.Normalize((0.1307,), (0.3081,))\n",
    "        ])\n",
    "\n",
    "dataset1 = datasets.MNIST(os.path.join(temp_path.name, \"data\"), train=True, download=True, transform=transform)\n",
    "dataset2 = datasets.MNIST(os.path.join(temp_path.name, \"data\"), train=False, transform=transform)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "In this tutorial, the images are loaded into the memory."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "outputs": [
    {
     "data": {
      "text/plain": "((70000, 28, 28), (70000,))"
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "train_images = []\n",
    "train_targets = []\n",
    "test_images = []\n",
    "test_targets = []\n",
    "for img, target in dataset1:\n",
    "    train_images.append(img)\n",
    "    train_targets.append(target)\n",
    "for img, target in dataset2:\n",
    "    test_images.append(img)\n",
    "    test_targets.append(target)\n",
    "images_array = torch.concat(train_images + test_images, dim=0).numpy()\n",
    "targets_array = np.array(train_targets + test_targets)\n",
    "images_array.shape, targets_array.shape"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Under this framework, multimodal data is loaded through data derivers. For data derivers, to load images for each data point, we need a column (here we name it `image_index`) in the tabular dataset that indicates the location of the image. In our case, the location is the index of the image in `images_array`. In other cases, the location might be a path to the image in the drive.\n",
    "\n",
    "The MNIST dataset has a separate testing set (index>=60,000 in the `images_array` and `targets_array` defined above). We will use it after training to see the performance."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "train_df = pd.DataFrame({\"image_index\": list(range(len(train_images))), \"target\": train_targets})\n",
    "test_df = pd.DataFrame({\"image_index\": list(range(len(train_images), len(train_images) + len(test_images))), \"target\": test_targets})"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "The data deriver to load images is very simple. Multimodal data is not in the tabular data, so `stacked=False` is set. The tabular data `df` contains indices of images that can be used to extract images from the above `images_array`. We need the user to pass an argument `image_path` to specify the column that indicates the location of images. This is not necessary because we can directly use `\"image_index\"` instead of `self.kwargs[\"image_path\"]` since we already know which column is needed."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "outputs": [],
   "source": [
    "from tabensemb.data import AbstractDeriver\n",
    "from tabensemb.data.dataderiver import deriver_mapping\n",
    "\n",
    "class MNISTLoader(AbstractDeriver):\n",
    "    def _required_cols(self):\n",
    "        return [\"image_path\"]\n",
    "\n",
    "    def _defaults(self):\n",
    "        return dict(stacked=False, derived_name=\"images\", intermediate=False, is_continuous=False)\n",
    "\n",
    "    def _derive(self, df, datamodule):\n",
    "        images = images_array[df[self.kwargs[\"image_path\"]]]\n",
    "        print(f\"Loaded images: {images.shape}\")\n",
    "        return images\n",
    "\n",
    "deriver_mapping[\"MNISTLoader\"] = MNISTLoader"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "The network of the official example can be easily migrated to the framework. In the forward passing, loaded images from the data deriver can be accessed in `derived_tensors`, and the key is `\"images\"` as defined above in `_defaults`. The tensor is of the shape `(n_samples, width, height)` and we transform it into `(n_samples, n_channels, width, height)` where `n_channels=1` to meet the requirement of `Net`."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "outputs": [],
   "source": [
    "from tabensemb.model import TorchModel, AbstractNN\n",
    "\n",
    "class NetNN(AbstractNN):\n",
    "    def __init__(self, datamodule, **kwargs):\n",
    "        super(NetNN, self).__init__(datamodule, **kwargs)\n",
    "        self.net = Net()\n",
    "\n",
    "    def _forward(self, x, derived_tensors):\n",
    "        images = derived_tensors[\"images\"].unsqueeze(1)\n",
    "        return self.net(images)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "The implementation of the model base is straightforward."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "outputs": [],
   "source": [
    "class NetModel(TorchModel):\n",
    "    def _initial_values(self, model_name):\n",
    "        return self.trainer.chosen_params\n",
    "\n",
    "    def _space(self, model_name):\n",
    "        return self.trainer.SPACE\n",
    "\n",
    "    def _new_model(self, model_name: str, verbose: bool, **kwargs):\n",
    "        return NetNN(self.trainer.datamodule, **kwargs)\n",
    "\n",
    "    def _get_program_name(self):\n",
    "        return \"NetModel\"\n",
    "\n",
    "    def _get_model_names(self):\n",
    "        return [\"Net\"]"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Then we configure the `Trainer`. Importantly, the `MNISTLoader` defined above is used to load images, and the argument `image_path` is given here."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The project will be saved to /tmp/tmpcdcd59k_/output/mnist/2023-09-18-10-50-06-0_UserInputConfig\n"
     ]
    }
   ],
   "source": [
    "from tabensemb.config import UserConfig\n",
    "from tabensemb.trainer import Trainer\n",
    "\n",
    "cfg = UserConfig.from_dict({\n",
    "    \"database\": \"mnist\",\n",
    "    \"label_name\": [\"target\"],\n",
    "    \"task\": \"multiclass\",\n",
    "    \"data_derivers\": [(\"MNISTLoader\", {\"image_path\": \"image_index\"})],\n",
    "    \"epoch\": 100,\n",
    "})\n",
    "trainer = Trainer(device=device)\n",
    "trainer.load_config(config=cfg)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Since we have a separate testing set, during the training stage, we use the first 50,000 images for training and the last 10,000 images for validation and testing. We use the `DataModule.set_data` API instead of `load_data` to configure the dataset using these indices, which will skip the data splitter."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded images: (60000, 28, 28)\n"
     ]
    }
   ],
   "source": [
    "train_indices = np.arange(50000)\n",
    "val_indices = np.arange(50000, 60000)\n",
    "test_indices = val_indices\n",
    "trainer.datamodule.set_data(train_df, cont_feature_names=[], cat_feature_names=[], label_name=[\"target\"], train_indices=train_indices, val_indices=val_indices, test_indices=test_indices)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "We can see that the images are loaded in `DataModule.derived_data`"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "outputs": [
    {
     "data": {
      "text/plain": "(60000, 28, 28)"
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "trainer.datamodule.derived_data[\"images\"].shape"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Now train the model. The default loss function is cross entropy loss as shown in the output."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "-------------Run NetModel-------------\n",
      "\n",
      "Training Net\n",
      "GPU available: True (cuda), used: True\n",
      "TPU available: False, using: 0 TPU cores\n",
      "IPU available: False, using: 0 IPUs\n",
      "HPU available: False, using: 0 HPUs\n",
      "You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision\n",
      "LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]\n",
      "\n",
      "  | Name                | Type             | Params\n",
      "---------------------------------------------------------\n",
      "0 | default_loss_fn     | CrossEntropyLoss | 0     \n",
      "1 | default_output_norm | Softmax          | 0     \n",
      "2 | net                 | Net              | 1.2 M \n",
      "---------------------------------------------------------\n",
      "1.2 M     Trainable params\n",
      "0         Non-trainable params\n",
      "1.2 M     Total params\n",
      "4.800     Total estimated model params size (MB)\n",
      "Epoch: 1/100, Train loss: 0.6124, Val loss: 0.1363, Min val loss: 0.1363, Min ES val loss: 0.1363, Epoch time: 1.463s.\n",
      "Epoch: 20/100, Train loss: 0.0194, Val loss: 0.0381, Min val loss: 0.0372, Min ES val loss: 0.0372, Epoch time: 1.424s.\n",
      "Epoch: 40/100, Train loss: 0.0119, Val loss: 0.0428, Min val loss: 0.0372, Min ES val loss: 0.0372, Epoch time: 1.339s.\n",
      "Epoch: 60/100, Train loss: 0.0084, Val loss: 0.0490, Min val loss: 0.0372, Min ES val loss: 0.0372, Epoch time: 1.189s.\n",
      "Epoch: 80/100, Train loss: 0.0073, Val loss: 0.0527, Min val loss: 0.0372, Min ES val loss: 0.0372, Epoch time: 1.075s.\n",
      "Epoch: 100/100, Train loss: 0.0063, Val loss: 0.0533, Min val loss: 0.0372, Min ES val loss: 0.0372, Epoch time: 1.164s.\n",
      "`Trainer.fit` stopped: `max_epochs=100` reached.\n",
      "Training log_loss loss: 0.00392\n",
      "Validation log_loss loss: 0.03531\n",
      "Testing log_loss loss: 0.03531\n",
      "Trainer saved. To load the trainer, run trainer = load_trainer(path='/tmp/tmpcdcd59k_/output/mnist/2023-09-18-10-50-06-0_UserInputConfig/trainer.pkl')\n",
      "\n",
      "-------------NetModel End-------------\n",
      "\n"
     ]
    }
   ],
   "source": [
    "trainer.clear_modelbase()\n",
    "trainer.add_modelbases([NetModel(trainer)])\n",
    "trainer.train(stderr_to_stdout=True)"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "It is easy to make inferences on the testing set for both predicted classes and probabilities. The data deriver again loads images from `images_array`."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded images: (10000, 28, 28)\n",
      "Loaded images: (10000, 28, 28)\n"
     ]
    }
   ],
   "source": [
    "predictions = trainer.get_modelbase(\"NetModel\").predict(test_df, model_name=\"Net\")\n",
    "proba = trainer.get_modelbase(\"NetModel\").predict_proba(test_df, model_name=\"Net\")"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "The prediction accuracy reaches around 99% on the testing set."
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%% md\n"
    }
   }
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "outputs": [
    {
     "data": {
      "text/plain": "0.9914"
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from tabensemb.utils import auto_metric_sklearn\n",
    "\n",
    "auto_metric_sklearn(targets_array[60000:], proba, \"accuracy_score\", \"multiclass\")"
   ],
   "metadata": {
    "collapsed": false,
    "pycharm": {
     "name": "#%%\n"
    }
   }
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}