{ "cells": [ { "cell_type": "markdown", "source": [ "# New data derivers\n", "\n", "In this package, a very limited number of derivers are currently provided. A deriver can be used to calculate new features (continuous or categorical) based on existing features, or load images, text, etc. as multimodal data. The source code of the integrated `tabensemb.data.dataderiver.RelativeDeriver` is extended here to demonstrate the implementation procedure.\n" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 1, "outputs": [], "source": [ "from tabensemb.data.dataderiver import AbstractDeriver" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "Data derivers inherit `tabensemb.data.AbstractDervier` and four methods should be implemented:\n", "\n", "* `_required_cols`: Arguments for columns that must exist in the tabular dataset. The following code means that the arguments `absolute_col` and `relative2_col` should be given in the configuration, such as `\"data_derivers\": [(\"MyRelativeDeriver\", {\"absolute_col\": \"cont_0\", \"relative2_col\": \"cont_1\"})]`\n", "\n", "```python\n", "class MyRelativeDeriver(AbstractDeriver):\n", " def _required_cols(self):\n", " return [\"absolute_col\", \"relative2_col\"]\n", "```\n", "\n", "* `_required_kwargs`: Parameters that must be specified in the configuration. The following code means that the parameter `some_param` should be given in the configuration, such as `\"data_derivers\": [(\"MyRelativeDeriver\", {\"some_param\": 1.5})]`\n", "\n", "```python\n", " def _required_kwargs(self):\n", " return [\"some_param\"]\n", "```\n", "\n", "**Remark**: \"stacked\", \"intermediate\", \"derived_name\", and \"is_continuous\" are shared necessary kwargs and do not need to be added to `_required_kwargs`.\n", "\n", "* `_defaults`: Default values of those in `_required_cols`, `_required_kwargs`, and `[\"stacked\", \"intermediate\", \"derived_name\", \"is_continuous\"]`. If default values are given, no error will be raised if the argument is not set in the configuration.\n", "\n", "```python\n", " def _defaults(self):\n", " return dict(stacked=True, intermediate=False, is_continuous=True)\n", "```\n", "\n", "* `_derive`: The main derivation step. It receives the tabular data (a `DataFrame`) and a `DataModule` and should return an `np.ndarray`. The returned array can not be 1d. Arguments are checked and recorded in `self.kwargs` when initializing.\n", "\n", "```python\n", " def _derive(self, df, datamodule):\n", " absolute_col = self.kwargs[\"absolute_col\"]\n", " relative2_col = self.kwargs[\"relative2_col\"]\n", " some_param = self.kwargs[\"some_param\"]\n", " stacked = self.kwargs[\"stacked\"]\n", "\n", " relative = df[absolute_col] / df[relative2_col]\n", " relative = relative.values.reshape(-1, 1)\n", " return relative\n", "```" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": true, "pycharm": { "name": "#%%\n" } }, "outputs": [], "source": [ "class MyRelativeDeriver(AbstractDeriver):\n", " def _required_cols(self):\n", " return [\"absolute_col\", \"relative2_col\"]\n", "\n", " def _required_kwargs(self):\n", " return [\"some_param\"]\n", "\n", " def _defaults(self):\n", " return dict(stacked=True, intermediate=False, is_continuous=True)\n", "\n", " def _derive(self, df, datamodule):\n", " absolute_col = self.kwargs[\"absolute_col\"]\n", " relative2_col = self.kwargs[\"relative2_col\"]\n", " some_param = self.kwargs[\"some_param\"]\n", " stacked = self.kwargs[\"stacked\"]\n", "\n", " relative = df[absolute_col] / df[relative2_col]\n", " relative = relative.values.reshape(-1, 1)\n", " return relative" ] }, { "cell_type": "markdown", "source": [ "The implemented splitter should be registered as follows to be recognized by `DataModule.set_data_derivers` automatically." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 3, "outputs": [], "source": [ "from tabensemb.data.dataderiver import deriver_mapping\n", "deriver_mapping[\"MyRelativeDeriver\"] = MyRelativeDeriver" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 4, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The project will be saved to ../../../../output/sample/2023-09-18-18-15-00-0_sample\n" ] } ], "source": [ "from tabensemb.trainer import Trainer\n", "import tabensemb\n", "\n", "prefix = \"../../../../\"\n", "tabensemb.setting[\"default_output_path\"] = prefix + \"output\"\n", "tabensemb.setting[\"default_config_path\"] = prefix + \"configs\"\n", "tabensemb.setting[\"default_data_path\"] = prefix + \"data\"\n", "\n", "trainer = Trainer(device=\"cpu\")\n", "\n", "trainer.load_config(\"sample\")" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "If `stacked` is `True`:" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 5, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset size: 153 51 52\n", "Data saved to ../../../../output/sample/2023-09-18-18-15-00-0_sample (data.csv and tabular_data.csv).\n", "cont_0_relative2_cont_1 in continuous features?: True\n" ] }, { "data": { "text/plain": " cont_0 cont_1 cont_2 cont_3 cont_4 cont_5 cont_6 \\\n0 -1.306527 0.065895 -0.118164 -0.159573 1.658131 -1.346718 -0.680178 \n1 2.011257 0.117717 0.195070 0.527004 -0.044595 0.616887 -1.781563 \n2 -1.216077 0.065895 -0.743672 0.730184 0.140672 1.272954 -0.159012 \n3 0.559299 0.117717 -0.431096 -0.809627 -1.063696 -0.860153 0.572751 \n4 0.910179 -0.213096 0.786328 -0.042257 0.317218 0.379152 -0.466419 \n.. ... ... ... ... ... ... ... \n251 0.280442 -0.206904 0.841631 0.880179 -0.993124 -1.570623 -0.249459 \n252 -1.165150 -1.070753 0.465662 1.054452 0.900826 -0.179925 -1.536244 \n253 -0.069856 -0.186691 -1.021913 -1.143641 0.250114 1.040239 -1.150438 \n254 -1.031482 -0.860262 -0.061638 0.328301 -1.429991 -1.048170 -1.432735 \n255 -1.461733 0.960693 0.367545 1.329063 -0.683440 -1.184687 0.190312 \n\n cont_7 cont_8 cont_9 ... cat_4 cat_5 cat_6 cat_7 \\\n0 -1.334258 0.666383 -0.460720 ... 2 category_4 3 4 \n1 0.354758 -0.729045 0.196557 ... 3 category_3 3 1 \n2 -0.475175 0.240057 0.100159 ... 4 category_3 4 1 \n3 -0.467441 0.677557 1.307184 ... 1 category_3 4 2 \n4 -0.017020 -0.944446 -0.410050 ... 0 category_2 0 2 \n.. ... ... ... ... ... ... ... ... \n251 0.643314 0.049495 0.493837 ... 2 category_2 2 3 \n252 1.178780 1.488252 1.895889 ... 2 category_4 4 2 \n253 0.258798 -0.836111 0.642211 ... 3 category_3 2 2 \n254 0.607112 0.087531 0.938747 ... 0 category_3 4 1 \n255 -0.521580 -0.851729 1.822724 ... 1 category_3 4 1 \n\n cat_8 cat_9 target target_binary target_multi_class \\\n0 4 3 -71.084217 0 1 \n1 3 2 13.415675 1 2 \n2 0 2 -47.492280 0 2 \n3 0 0 -94.482614 1 2 \n4 3 0 195.819531 1 3 \n.. ... ... ... ... ... \n251 0 2 -171.249549 0 0 \n252 1 1 23.708442 0 2 \n253 2 2 -33.414215 1 1 \n254 4 4 -359.199191 0 4 \n255 1 4 -135.199100 1 2 \n\n cont_0_relative2_cont_1 \n0 -19.827301 \n1 17.085552 \n2 -18.454666 \n3 4.751225 \n4 -4.271217 \n.. ... \n251 -1.355422 \n252 1.088160 \n253 0.374183 \n254 1.199032 \n255 -1.521539 \n\n[256 rows x 24 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
cont_0cont_1cont_2cont_3cont_4cont_5cont_6cont_7cont_8cont_9...cat_4cat_5cat_6cat_7cat_8cat_9targettarget_binarytarget_multi_classcont_0_relative2_cont_1
0-1.3065270.065895-0.118164-0.1595731.658131-1.346718-0.680178-1.3342580.666383-0.460720...2category_43443-71.08421701-19.827301
12.0112570.1177170.1950700.527004-0.0445950.616887-1.7815630.354758-0.7290450.196557...3category_3313213.4156751217.085552
2-1.2160770.065895-0.7436720.7301840.1406721.272954-0.159012-0.4751750.2400570.100159...4category_34102-47.49228002-18.454666
30.5592990.117717-0.431096-0.809627-1.063696-0.8601530.572751-0.4674410.6775571.307184...1category_34200-94.482614124.751225
40.910179-0.2130960.786328-0.0422570.3172180.379152-0.466419-0.017020-0.944446-0.410050...0category_20230195.81953113-4.271217
..................................................................
2510.280442-0.2069040.8416310.880179-0.993124-1.570623-0.2494590.6433140.0494950.493837...2category_22302-171.24954900-1.355422
252-1.165150-1.0707530.4656621.0544520.900826-0.179925-1.5362441.1787801.4882521.895889...2category_4421123.708442021.088160
253-0.069856-0.186691-1.021913-1.1436410.2501141.040239-1.1504380.258798-0.8361110.642211...3category_32222-33.414215110.374183
254-1.031482-0.860262-0.0616380.328301-1.429991-1.048170-1.4327350.6071120.0875310.938747...0category_34144-359.199191041.199032
255-1.4617330.9606930.3675451.329063-0.683440-1.1846870.190312-0.521580-0.8517291.822724...1category_34114-135.19910012-1.521539
\n

256 rows × 24 columns

\n
" }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trainer.datamodule.set_data_derivers([(\"MyRelativeDeriver\", {\"absolute_col\": \"cont_0\", \"relative2_col\": \"cont_1\", \"derived_name\": \"cont_0_relative2_cont_1\", \"some_param\": 1.0, \"stacked\": True})])\n", "trainer.load_data()\n", "print(f\"cont_0_relative2_cont_1 in continuous features?: {'cont_0_relative2_cont_1' in trainer.cont_feature_names}\")\n", "trainer.df" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "If `stacked` is `True` but `intermediate` is True:" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 6, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using previously used data path ../../../../data/sample.csv\n", "Dataset size: 153 51 52\n", "Data saved to ../../../../output/sample/2023-09-18-18-15-00-0_sample (data.csv and tabular_data.csv).\n", "cont_0_relative2_cont_1 in continuous features?: False\n" ] }, { "data": { "text/plain": " cont_0 cont_1 cont_2 cont_3 cont_4 cont_5 cont_6 \\\n0 -1.306527 -0.409756 -0.118164 -0.159573 1.658131 -1.346718 -0.680178 \n1 2.011257 -0.409756 0.195070 0.527004 -0.044595 0.616887 -1.781563 \n2 -1.216077 0.104704 -0.743672 0.730184 0.140672 1.272954 -0.159012 \n3 0.559299 0.104704 -0.431096 -0.809627 -1.063696 -0.860153 0.572751 \n4 0.910179 -0.409756 0.786328 -0.042257 0.317218 0.379152 -0.466419 \n.. ... ... ... ... ... ... ... \n251 0.280442 -0.206904 0.841631 0.880179 -0.993124 -1.570623 -0.249459 \n252 -1.165150 -1.070753 0.465662 1.054452 0.900826 -0.179925 -1.536244 \n253 -0.069856 -0.186691 -1.021913 -1.143641 0.250114 1.040239 -1.150438 \n254 -1.031482 -0.860262 -0.061638 0.328301 -1.429991 -1.048170 -1.432735 \n255 -1.461733 0.960693 0.367545 1.329063 -0.683440 -1.184687 0.190312 \n\n cont_7 cont_8 cont_9 ... cat_4 cat_5 cat_6 cat_7 \\\n0 -1.334258 0.666383 -0.460720 ... 2 category_4 3 4 \n1 0.354758 -0.729045 0.196557 ... 3 category_3 3 1 \n2 -0.475175 0.240057 0.100159 ... 4 category_3 4 1 \n3 -0.467441 0.677557 1.307184 ... 1 category_3 4 2 \n4 -0.017020 -0.944446 -0.410050 ... 0 category_2 0 2 \n.. ... ... ... ... ... ... ... ... \n251 0.643314 0.049495 0.493837 ... 2 category_2 2 3 \n252 1.178780 1.488252 1.895889 ... 2 category_4 4 2 \n253 0.258798 -0.836111 0.642211 ... 3 category_3 2 2 \n254 0.607112 0.087531 0.938747 ... 0 category_3 4 1 \n255 -0.521580 -0.851729 1.822724 ... 1 category_3 4 1 \n\n cat_8 cat_9 target target_binary target_multi_class \\\n0 4 3 -71.084217 0 1 \n1 3 2 13.415675 1 2 \n2 0 2 -47.492280 0 2 \n3 0 0 -94.482614 1 2 \n4 3 0 195.819531 1 3 \n.. ... ... ... ... ... \n251 0 2 -171.249549 0 0 \n252 1 1 23.708442 0 2 \n253 2 2 -33.414215 1 1 \n254 4 4 -359.199191 0 4 \n255 1 4 -135.199100 1 2 \n\n cont_0_relative2_cont_1 \n0 3.188552 \n1 -4.908431 \n2 -11.614467 \n3 5.341736 \n4 -2.221273 \n.. ... \n251 -1.355422 \n252 1.088160 \n253 0.374183 \n254 1.199032 \n255 -1.521539 \n\n[256 rows x 24 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
cont_0cont_1cont_2cont_3cont_4cont_5cont_6cont_7cont_8cont_9...cat_4cat_5cat_6cat_7cat_8cat_9targettarget_binarytarget_multi_classcont_0_relative2_cont_1
0-1.306527-0.409756-0.118164-0.1595731.658131-1.346718-0.680178-1.3342580.666383-0.460720...2category_43443-71.084217013.188552
12.011257-0.4097560.1950700.527004-0.0445950.616887-1.7815630.354758-0.7290450.196557...3category_3313213.41567512-4.908431
2-1.2160770.104704-0.7436720.7301840.1406721.272954-0.159012-0.4751750.2400570.100159...4category_34102-47.49228002-11.614467
30.5592990.104704-0.431096-0.809627-1.063696-0.8601530.572751-0.4674410.6775571.307184...1category_34200-94.482614125.341736
40.910179-0.4097560.786328-0.0422570.3172180.379152-0.466419-0.017020-0.944446-0.410050...0category_20230195.81953113-2.221273
..................................................................
2510.280442-0.2069040.8416310.880179-0.993124-1.570623-0.2494590.6433140.0494950.493837...2category_22302-171.24954900-1.355422
252-1.165150-1.0707530.4656621.0544520.900826-0.179925-1.5362441.1787801.4882521.895889...2category_4421123.708442021.088160
253-0.069856-0.186691-1.021913-1.1436410.2501141.040239-1.1504380.258798-0.8361110.642211...3category_32222-33.414215110.374183
254-1.031482-0.860262-0.0616380.328301-1.429991-1.048170-1.4327350.6071120.0875310.938747...0category_34144-359.199191041.199032
255-1.4617330.9606930.3675451.329063-0.683440-1.1846870.190312-0.521580-0.8517291.822724...1category_34114-135.19910012-1.521539
\n

256 rows × 24 columns

\n
" }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trainer.datamodule.set_data_derivers([(\"MyRelativeDeriver\", {\"absolute_col\": \"cont_0\", \"relative2_col\": \"cont_1\", \"derived_name\": \"cont_0_relative2_cont_1\", \"some_param\": 1.0, \"stacked\": True, \"intermediate\": True})])\n", "trainer.load_data()\n", "print(f\"cont_0_relative2_cont_1 in continuous features?: {'cont_0_relative2_cont_1' in trainer.cont_feature_names}\")\n", "trainer.df" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "markdown", "source": [ "If `stacked` is `False`:" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } } }, { "cell_type": "code", "execution_count": 7, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using previously used data path ../../../../data/sample.csv\n", "Dataset size: 153 51 52\n", "Data saved to ../../../../output/sample/2023-09-18-18-15-00-0_sample (data.csv and tabular_data.csv).\n", "cont_0_relative2_cont_1 in continuous features?: False\n" ] }, { "data": { "text/plain": " cont_0 cont_1 cont_2 cont_3 cont_4 cont_5 cont_6 \\\n0 -1.306527 0.138315 -0.118164 -0.159573 1.658131 -1.346718 -0.680178 \n1 2.011257 -0.006111 0.195070 0.527004 -0.044595 0.616887 -1.781563 \n2 -1.216077 0.138315 -0.743672 0.730184 0.140672 1.272954 -0.159012 \n3 0.559299 -0.006111 -0.431096 -0.809627 -1.063696 -0.860153 0.572751 \n4 0.910179 -0.006111 0.786328 -0.042257 0.317218 0.379152 -0.466419 \n.. ... ... ... ... ... ... ... \n251 0.280442 -0.206904 0.841631 0.880179 -0.993124 -1.570623 -0.249459 \n252 -1.165150 -1.070753 0.465662 1.054452 0.900826 -0.179925 -1.536244 \n253 -0.069856 -0.186691 -1.021913 -1.143641 0.250114 1.040239 -1.150438 \n254 -1.031482 -0.860262 -0.061638 0.328301 -1.429991 -1.048170 -1.432735 \n255 -1.461733 0.960693 0.367545 1.329063 -0.683440 -1.184687 0.190312 \n\n cont_7 cont_8 cont_9 ... cat_3 cat_4 cat_5 cat_6 \\\n0 -1.334258 0.666383 -0.460720 ... 0 2 category_4 3 \n1 0.354758 -0.729045 0.196557 ... 4 3 category_3 3 \n2 -0.475175 0.240057 0.100159 ... 0 4 category_3 4 \n3 -0.467441 0.677557 1.307184 ... 4 1 category_3 4 \n4 -0.017020 -0.944446 -0.410050 ... 1 0 category_2 0 \n.. ... ... ... ... ... ... ... ... \n251 0.643314 0.049495 0.493837 ... 1 2 category_2 2 \n252 1.178780 1.488252 1.895889 ... 4 2 category_4 4 \n253 0.258798 -0.836111 0.642211 ... 0 3 category_3 2 \n254 0.607112 0.087531 0.938747 ... 0 0 category_3 4 \n255 -0.521580 -0.851729 1.822724 ... 2 1 category_3 4 \n\n cat_7 cat_8 cat_9 target target_binary target_multi_class \n0 4 4 3 -71.084217 0 1 \n1 1 3 2 13.415675 1 2 \n2 1 0 2 -47.492280 0 2 \n3 2 0 0 -94.482614 1 2 \n4 2 3 0 195.819531 1 3 \n.. ... ... ... ... ... ... \n251 3 0 2 -171.249549 0 0 \n252 2 1 1 23.708442 0 2 \n253 2 2 2 -33.414215 1 1 \n254 1 4 4 -359.199191 0 4 \n255 1 1 4 -135.199100 1 2 \n\n[256 rows x 23 columns]", "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
cont_0cont_1cont_2cont_3cont_4cont_5cont_6cont_7cont_8cont_9...cat_3cat_4cat_5cat_6cat_7cat_8cat_9targettarget_binarytarget_multi_class
0-1.3065270.138315-0.118164-0.1595731.658131-1.346718-0.680178-1.3342580.666383-0.460720...02category_43443-71.08421701
12.011257-0.0061110.1950700.527004-0.0445950.616887-1.7815630.354758-0.7290450.196557...43category_3313213.41567512
2-1.2160770.138315-0.7436720.7301840.1406721.272954-0.159012-0.4751750.2400570.100159...04category_34102-47.49228002
30.559299-0.006111-0.431096-0.809627-1.063696-0.8601530.572751-0.4674410.6775571.307184...41category_34200-94.48261412
40.910179-0.0061110.786328-0.0422570.3172180.379152-0.466419-0.017020-0.944446-0.410050...10category_20230195.81953113
..................................................................
2510.280442-0.2069040.8416310.880179-0.993124-1.570623-0.2494590.6433140.0494950.493837...12category_22302-171.24954900
252-1.165150-1.0707530.4656621.0544520.900826-0.179925-1.5362441.1787801.4882521.895889...42category_4421123.70844202
253-0.069856-0.186691-1.021913-1.1436410.2501141.040239-1.1504380.258798-0.8361110.642211...03category_32222-33.41421511
254-1.031482-0.860262-0.0616380.328301-1.429991-1.048170-1.4327350.6071120.0875310.938747...00category_34144-359.19919104
255-1.4617330.9606930.3675451.329063-0.683440-1.1846870.190312-0.521580-0.8517291.822724...21category_34114-135.19910012
\n

256 rows × 23 columns

\n
" }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trainer.datamodule.set_data_derivers([(\"MyRelativeDeriver\", {\"absolute_col\": \"cont_0\", \"relative2_col\": \"cont_1\", \"derived_name\": \"cont_0_relative2_cont_1\", \"some_param\": 1.0, \"stacked\": False})])\n", "trainer.load_data()\n", "print(f\"cont_0_relative2_cont_1 in continuous features?: {'cont_0_relative2_cont_1' in trainer.cont_feature_names}\")\n", "trainer.df" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } }, { "cell_type": "code", "execution_count": 8, "outputs": [ { "data": { "text/plain": "dict_keys(['cont_0_relative2_cont_1', 'categorical'])" }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "trainer.derived_data.keys()" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } } } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 0 }