{"id":178873,"date":"2026-03-13T16:10:13","date_gmt":"2026-03-13T16:10:13","guid":{"rendered":"http:\/\/ktromedia.com\/?p=178873"},"modified":"2026-03-13T16:10:13","modified_gmt":"2026-03-13T16:10:13","slug":"from-text-to-tables-feature-engineering-with-llms-for-tabular-data","status":"publish","type":"post","link":"https:\/\/ktromedia.com\/?p=178873","title":{"rendered":"From Text to Tables: Feature Engineering with LLMs for Tabular Data"},"content":{"rendered":"<div id=\"\">\n<p>In this article, you will learn how to use a pre-trained large language model to extract structured features from text and combine them with numeric columns to train a supervised classifier.<\/p>\n<p>Topics we will cover include:<\/p>\n<ul>\n<li>Creating a toy dataset with mixed text and numeric fields for classification<\/li>\n<li>Using a Groq-hosted LLaMA model to extract JSON features from ticket text with a Pydantic schema<\/li>\n<li>Training and evaluating a scikit-learn classifier on the engineered tabular dataset<\/li>\n<\/ul>\n<p>Let\u2019s not waste any more time.<\/p>\n<div style=\"width: 810px\" class=\"wp-caption aligncenter\"><\/p>\n<p class=\"wp-caption-text\">From Text to Tables: Feature Engineering with LLMs for Tabular Data<br \/>Image by Editor<\/p>\n<\/div>\n<h2>Introduction<\/h2>\n<p>While <strong>large language models<\/strong> (LLMs) are typically used for conversational purposes in use cases that revolve around natural language interactions, they can also assist with tasks like <strong>feature engineering<\/strong> on complex datasets. Specifically, you can leverage pre-trained LLMs from providers like <strong><a href=\"https:\/\/groq.com\/\" target=\"_blank\">Groq<\/a><\/strong> (for example, models from the <strong>Llama<\/strong> family) to undertake data transformation and preprocessing tasks, including turning unstructured data like text into fully structured, tabular data that can be used to fuel <strong>predictive machine learning models<\/strong>.<\/p>\n<p>In this article, I will guide you through the full process of applying feature engineering to structured text, turning it into tabular data suitable for a machine learning model \u2014 namely, a classifier trained on features created from text by using an LLM.<\/p>\n<h2>Setup and Imports<\/h2>\n<p>First, we will make all the necessary imports for this practical example:<\/p>\n<div id=\"urvanov-syntax-highlighter-69b06e58d100d646908504\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\" style=\" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;\">\n<p><textarea wrap=\"soft\" class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly=\"readonly\" style=\"-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;\"><br \/>\nimport pandas as pd&#13;<br \/>\nimport json&#13;<br \/>\nfrom pydantic import BaseModel, Field&#13;<br \/>\nfrom openai import OpenAI&#13;<br \/>\nfrom google.colab import userdata&#13;<br \/>\nfrom sklearn.ensemble import RandomForestClassifier&#13;<br \/>\nfrom sklearn.model_selection import train_test_split&#13;<br \/>\nfrom sklearn.metrics import classification_report&#13;<br \/>\nfrom sklearn.preprocessing import StandardScaler<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\" style=\"\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\" style=\"font-size: 12px !important; line-height: 15px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;\">\n<p><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">pandas <\/span><span class=\"crayon-st\">as<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">pd<\/span><\/p>\n<p><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">json<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">pydantic <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">BaseModel<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">Field<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-e\">openai <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">OpenAI<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">google<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">colab <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">userdata<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">ensemble <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">RandomForestClassifier<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_selection <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">train_test_split<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">metrics <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">classification_report<\/span><\/p>\n<p><span class=\"crayon-e\">from <\/span><span class=\"crayon-v\">sklearn<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">preprocessing <\/span><span class=\"crayon-e\">import <\/span><span class=\"crayon-v\">StandardScaler<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>Note that besides common libraries for machine learning and data preprocessing like scikit-learn, we import the <code>OpenAI<\/code> class \u2014 not because we will directly use an OpenAI model, but because many LLM APIs (including Groq\u2019s) have adopted the same interface style and specifications as OpenAI. This class therefore helps you interact with a variety of providers and access a wide range of LLMs through a single client, including Llama models via Groq, as we will see shortly.<\/p>\n<p>Next, we set up a Groq client to enable access to a pre-trained LLM that we can call via API for inference during execution:<\/p>\n<div id=\"urvanov-syntax-highlighter-69b06e58d101b828439773\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\" style=\" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;\">\n<p><textarea wrap=\"soft\" class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly=\"readonly\" style=\"-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;\"><br \/>\ngroq_api_key = userdata.get(&#8216;GROQ_API_KEY&#8217;)&#13;<br \/>\nclient = OpenAI(&#13;<br \/>\n    base_url=&#8221;https:\/\/api.groq.com\/openai\/v1&#8243;,&#13;<br \/>\n    api_key=groq_api_key &#13;<br \/>\n)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\" style=\"\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\" style=\"font-size: 12px !important; line-height: 15px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;\">\n<p><span class=\"crayon-v\">groq_api_key<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">userdata<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">get<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8216;GROQ_API_KEY&#8217;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">client<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">OpenAI<\/span><span class=\"crayon-sy\">(<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">base_url<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8220;https:\/\/api.groq.com\/openai\/v1&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">api_key<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">groq_api<\/span><span class=\"crayon-sy\">_<\/span>key<span class=\"crayon-h\"> <\/span><\/p>\n<p><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p><strong>Important note<\/strong>: for the above code to work, you need to define an API secret key for Groq. In Google Colab, you can do this through the \u201cSecrets\u201d icon on the left-hand side bar (this icon looks like a key). Here, give your key the name <code>'GROQ_API_KEY'<\/code>, then register on the <a href=\"https:\/\/console.groq.com\/keys\" target=\"_blank\">Groq website<\/a> to get an actual key, and paste it into the value field.<\/p>\n<h2>Creating a Toy Ticket Dataset<\/h2>\n<p>The next step generates a synthetic, partly random toy dataset for illustrative purposes. If you have your own text dataset, feel free to adapt the code accordingly and use your own.<\/p>\n<div id=\"urvanov-syntax-highlighter-69b06e58d101f541712639\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\" style=\" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;\">\n<p><textarea wrap=\"soft\" class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly=\"readonly\" style=\"-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;\"><br \/>\nimport random&#13;<br \/>\nimport time&#13;<br \/>\n&#13;<br \/>\nrandom.seed(42)&#13;<br \/>\ncategories = [&#8220;access&#8221;, &#8220;inquiry&#8221;, &#8220;software&#8221;, &#8220;billing&#8221;, &#8220;hardware&#8221;]&#13;<br \/>\n&#13;<br \/>\ntemplates = {&#13;<br \/>\n    &#8220;access&#8221;: [&#13;<br \/>\n        &#8220;I&#8217;ve been locked out of my account for {days} days and need urgent help!&#8221;,&#13;<br \/>\n        &#8220;I can&#8217;t log in, it keeps saying bad password.&#8221;,&#13;<br \/>\n        &#8220;Reset my access credentials immediately.&#8221;,&#13;<br \/>\n        &#8220;My 2FA isn&#8217;t working, please help me get into my account.&#8221;&#13;<br \/>\n    ],&#13;<br \/>\n    &#8220;inquiry&#8221;: [&#13;<br \/>\n        &#8220;When will my new credit card arrive in the mail?&#8221;,&#13;<br \/>\n        &#8220;Just checking on the status of my recent order.&#8221;,&#13;<br \/>\n        &#8220;What are your business hours on weekends?&#8221;,&#13;<br \/>\n        &#8220;Can I upgrade my current plan to the premium tier?&#8221;&#13;<br \/>\n    ],&#13;<br \/>\n    &#8220;software&#8221;: [&#13;<br \/>\n        &#8220;The app keeps crashing every time I try to view my transaction history.&#8221;,&#13;<br \/>\n        &#8220;Software bug: the submit button is greyed out.&#8221;,&#13;<br \/>\n        &#8220;Pages are loading incredibly slowly since the last update.&#8221;,&#13;<br \/>\n        &#8220;I&#8217;m getting a 500 Internal Server Error on the dashboard.&#8221;&#13;<br \/>\n    ],&#13;<br \/>\n    &#8220;billing&#8221;: [&#13;<br \/>\n        &#8220;I need a refund for the extra charges on my bill.&#8221;,&#13;<br \/>\n        &#8220;Why was I billed twice this month?&#8221;,&#13;<br \/>\n        &#8220;Please update my payment method, the old card expired.&#8221;,&#13;<br \/>\n        &#8220;I didn&#8217;t authorize this $49.99 transaction.&#8221;&#13;<br \/>\n    ],&#13;<br \/>\n    &#8220;hardware&#8221;: [&#13;<br \/>\n        &#8220;My hardware token is broken, I can&#8217;t log in.&#8221;,&#13;<br \/>\n        &#8220;The screen on my physical device is cracked.&#8221;,&#13;<br \/>\n        &#8220;The card reader isn&#8217;t scanning properly anymore.&#8221;,&#13;<br \/>\n        &#8220;Battery drains in 10 minutes, I need a replacement unit.&#8221;&#13;<br \/>\n    ]&#13;<br \/>\n}&#13;<br \/>\n&#13;<br \/>\ndata = []&#13;<br \/>\nfor _ in range(100):&#13;<br \/>\n    cat = random.choice(categories)&#13;<br \/>\n    # Injecting a random number of days into specific templates to foster variety&#13;<br \/>\n    text = random.choice(templates[cat]).format(days=random.randint(1, 14))&#13;<br \/>\n    &#13;<br \/>\n    data.append({&#13;<br \/>\n        &#8220;text&#8221;: text,&#13;<br \/>\n        &#8220;account_age_days&#8221;: random.randint(1, 2000),&#13;<br \/>\n        &#8220;prior_tickets&#8221;: random.choices([0, 1, 2, 3, 4, 5], weights=[40, 30, 15, 10, 3, 2])[0],&#13;<br \/>\n        &#8220;label&#8221;: cat&#13;<br \/>\n    })&#13;<br \/>\n&#13;<br \/>\ndf = pd.DataFrame(data)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\" style=\"\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\" style=\"font-size: 12px !important; line-height: 15px !important;\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<p>18<\/p>\n<p>19<\/p>\n<p>20<\/p>\n<p>21<\/p>\n<p>22<\/p>\n<p>23<\/p>\n<p>24<\/p>\n<p>25<\/p>\n<p>26<\/p>\n<p>27<\/p>\n<p>28<\/p>\n<p>29<\/p>\n<p>30<\/p>\n<p>31<\/p>\n<p>32<\/p>\n<p>33<\/p>\n<p>34<\/p>\n<p>35<\/p>\n<p>36<\/p>\n<p>37<\/p>\n<p>38<\/p>\n<p>39<\/p>\n<p>40<\/p>\n<p>41<\/p>\n<p>42<\/p>\n<p>43<\/p>\n<p>44<\/p>\n<p>45<\/p>\n<p>46<\/p>\n<p>47<\/p>\n<p>48<\/p>\n<p>49<\/p>\n<p>50<\/p>\n<p>51<\/p>\n<p>52<\/p>\n<p>53<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\" style=\"font-size: 12px !important; line-height: 15px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;\">\n<p><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">random<\/span><\/p>\n<p><span class=\"crayon-e\">import <\/span><span class=\"crayon-e\">time<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">seed<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">42<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">categories<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-s\">&#8220;access&#8221;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8220;inquiry&#8221;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8220;software&#8221;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8220;billing&#8221;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8220;hardware&#8221;<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">templates<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">{<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;access&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;I&#8217;ve been locked out of my account for {days} days and need urgent help!&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;I can&#8217;t log in, it keeps saying bad password.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;Reset my access credentials immediately.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;My 2FA isn&#8217;t working, please help me get into my account.&#8221;<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;inquiry&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;When will my new credit card arrive in the mail?&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;Just checking on the status of my recent order.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;What are your business hours on weekends?&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;Can I upgrade my current plan to the premium tier?&#8221;<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;software&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;The app keeps crashing every time I try to view my transaction history.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;Software bug: the submit button is greyed out.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;Pages are loading incredibly slowly since the last update.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;I&#8217;m getting a 500 Internal Server Error on the dashboard.&#8221;<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;billing&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;I need a refund for the extra charges on my bill.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;Why was I billed twice this month?&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;Please update my payment method, the old card expired.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;I didn&#8217;t authorize this $49.99 transaction.&#8221;<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;hardware&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;My hardware token is broken, I can&#8217;t log in.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;The screen on my physical device is cracked.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;The card reader isn&#8217;t scanning properly anymore.&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;Battery drains in 10 minutes, I need a replacement unit.&#8221;<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-sy\">}<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">data<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p><span class=\"crayon-st\">for<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">_<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-st\">in<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">range<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">100<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">cat<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">choice<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">categories<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-p\"># Injecting a random number of days into specific templates to foster variety<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">text<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">choice<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">templates<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-v\">cat<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">format<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">days<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">randint<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">14<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">append<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">{<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;text&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">text<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;account_age_days&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">randint<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">2000<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;prior_tickets&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">choices<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">4<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">weights<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">40<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">30<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">15<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">10<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;label&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">cat<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">}<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">df<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">pd<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">DataFrame<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">data<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>The dataset generated contains customer support tickets, combining text descriptions with structured numeric features like account age and number of prior tickets, as well as a class label spanning several ticket categories. These labels will later be used for training and evaluating a classification model at the end of the process.<\/p>\n<h2>Extracting LLM Features<\/h2>\n<p>Next, we define the desired tabular features we want to extract from the text. The choice of features is domain-dependent and fully customizable, but you will use the LLM later on to extract these fields in a consistent, structured format:<\/p>\n<div id=\"urvanov-syntax-highlighter-69b06e58d1026938552081\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\" style=\" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;\">\n<p><textarea wrap=\"soft\" class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly=\"readonly\" style=\"-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;\"><br \/>\nclass TicketFeatures(BaseModel):&#13;<br \/>\n    urgency_score: int = Field(description=&#8221;Urgency of the ticket on a scale of 1 to 5&#8243;)&#13;<br \/>\n    is_frustrated: int = Field(description=&#8221;1 if the user expresses frustration, 0 otherwise&#8221;)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\" style=\"\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\" style=\"font-size: 12px !important; line-height: 15px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;\">\n<p><span class=\"crayon-t\">class<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">TicketFeatures<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">BaseModel<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">urgency_score<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-t\">int<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">Field<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">description<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8220;Urgency of the ticket on a scale of 1 to 5&#8221;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">is_frustrated<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-t\">int<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">Field<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">description<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8220;1 if the user expresses frustration, 0 otherwise&#8221;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>For example, urgency and frustration often correlate with specific ticket types (e.g. access lockouts and outages tend to be more urgent and emotionally charged than general inquiries), so these signals can help a downstream classifier separate categories more effectively than raw text alone.<\/p>\n<p>The next function is a key element of the process, as it encapsulates the LLM integration needed to transform a ticket\u2019s text into a JSON object that matches our schema.<\/p>\n<div id=\"urvanov-syntax-highlighter-69b06e58d102f248209436\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\" style=\" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;\">\n<p><textarea wrap=\"soft\" class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly=\"readonly\" style=\"-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;\"><br \/>\ndef extract_features(text: str) -&gt; dict:&#13;<br \/>\n    # Sleep for 2.5 seconds for safer use under the constraints of the 30 RPM free-tier limit&#13;<br \/>\n    time.sleep(2.5) &#13;<br \/>\n    &#13;<br \/>\n    schema_instructions = json.dumps(TicketFeatures.model_json_schema())&#13;<br \/>\n    response = client.chat.completions.create(&#13;<br \/>\n        model=&#8221;llama-3.3-70b-versatile&#8221;,&#13;<br \/>\n        messages=[&#13;<br \/>\n            {&#13;<br \/>\n                &#8220;role&#8221;: &#8220;system&#8221;, &#13;<br \/>\n                &#8220;content&#8221;: f&#8221;You are an extraction assistant. Output ONLY valid JSON matching this schema: {schema_instructions}&#8221;&#13;<br \/>\n            },&#13;<br \/>\n            {&#8220;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: text}&#13;<br \/>\n        ],&#13;<br \/>\n        response_format={&#8220;type&#8221;: &#8220;json_object&#8221;},&#13;<br \/>\n        temperature=0.0 &#13;<br \/>\n    )&#13;<br \/>\n    return json.loads(response.choices[0].message.content)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\" style=\"\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<div class=\"urvanov-syntax-highlighter-nums-content\" style=\"font-size: 12px !important; line-height: 15px !important;\">\n<p>1<\/p>\n<p>2<\/p>\n<p>3<\/p>\n<p>4<\/p>\n<p>5<\/p>\n<p>6<\/p>\n<p>7<\/p>\n<p>8<\/p>\n<p>9<\/p>\n<p>10<\/p>\n<p>11<\/p>\n<p>12<\/p>\n<p>13<\/p>\n<p>14<\/p>\n<p>15<\/p>\n<p>16<\/p>\n<p>17<\/p>\n<p>18<\/p>\n<\/div>\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\" style=\"font-size: 12px !important; line-height: 15px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;\">\n<p><span class=\"crayon-e\">def <\/span><span class=\"crayon-e\">extract_features<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">text<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">str<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">-&gt;<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">dict<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-p\"># Sleep for 2.5 seconds for safer use under the constraints of the 30 RPM free-tier limit<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">time<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">sleep<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-cn\">2.5<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-h\"> <\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">schema_instructions<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">json<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">dumps<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">TicketFeatures<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">model_json_schema<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">response<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">client<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">chat<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">completions<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">create<\/span><span class=\"crayon-sy\">(<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">model<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-s\">&#8220;llama-3.3-70b-versatile&#8221;<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">messages<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-sy\">[<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">{<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;role&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8220;system&#8221;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-s\">&#8220;content&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">f<\/span><span class=\"crayon-s\">&#8220;You are an extraction assistant. Output ONLY valid JSON matching this schema: {schema_instructions}&#8221;<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">}<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">{<\/span><span class=\"crayon-s\">&#8220;role&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8220;user&#8221;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8220;content&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">text<\/span><span class=\"crayon-sy\">}<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">response_format<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-sy\">{<\/span><span class=\"crayon-s\">&#8220;type&#8221;<\/span><span class=\"crayon-o\">:<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8220;json_object&#8221;<\/span><span class=\"crayon-sy\">}<\/span><span class=\"crayon-sy\">,<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-v\">temperature<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">0.0<\/span><span class=\"crayon-h\"> <\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-st\">return<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">json<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">loads<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">response<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">choices<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">message<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-v\">content<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>Why does the function return JSON objects? First, JSON is a reliable way to ask an LLM to produce structured outputs. Second, JSON objects can be easily converted into Pandas <code>Series<\/code> objects, which can then be seamlessly merged with other columns of an existing <code>DataFrame<\/code> to become new ones. The following instructions do the trick and append the new features, stored in <code>engineered_features<\/code>, to the rest of the original dataset:<\/p>\n<div id=\"urvanov-syntax-highlighter-69b06e58d1033816444638\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\" style=\" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;\">\n<p><textarea wrap=\"soft\" class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly=\"readonly\" style=\"-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;\"><br \/>\nprint(&#8220;1. Extracting structured features from text using LLM&#8230;&#8221;)&#13;<br \/>\nengineered_features = df[&#8220;text&#8221;].apply(extract_features)&#13;<br \/>\nfeatures_df = pd.DataFrame(engineered_features.tolist())&#13;<br \/>\n&#13;<br \/>\nX_raw = pd.concat([df.drop(columns=[&#8220;text&#8221;, &#8220;label&#8221;]), features_df], axis=1)&#13;<br \/>\ny = df[&#8220;label&#8221;]&#13;<br \/>\n&#13;<br \/>\nprint(&#8220;\\n2. Final Engineered Tabular Dataset:&#8221;)&#13;<br \/>\nprint(X_raw)<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\" style=\"\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\" style=\"font-size: 12px !important; line-height: 15px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;\">\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8220;1. Extracting structured features from text using LLM&#8230;&#8221;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">engineered_features<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">df<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-s\">&#8220;text&#8221;<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">apply<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">extract_features<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">features_df<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">pd<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">DataFrame<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">engineered_features<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">tolist<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-v\">X_raw<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">pd<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">concat<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-v\">df<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">drop<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">columns<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-s\">&#8220;text&#8221;<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-s\">&#8220;label&#8221;<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">features_df<\/span><span class=\"crayon-sy\">]<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">axis<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">y<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">df<\/span><span class=\"crayon-sy\">[<\/span><span class=\"crayon-s\">&#8220;label&#8221;<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8220;\\n2. Final Engineered Tabular Dataset:&#8221;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_raw<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>Here is what the resulting tabular data looks like:<\/p>\n<div id=\"urvanov-syntax-highlighter-69b06e58d103b903092114\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\" style=\" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;\">\n<p><textarea wrap=\"soft\" class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly=\"readonly\" style=\"-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;\"><br \/>\n          account_age_days  prior_tickets  urgency_score  is_frustrated&#13;<br \/>\n0                564              0              5              1&#13;<br \/>\n1               1517              3              4              0&#13;<br \/>\n2                 62              0              5              1&#13;<br \/>\n3                408              2              4              0&#13;<br \/>\n4                920              1              5              1&#13;<br \/>\n..               &#8230;            &#8230;            &#8230;            &#8230;&#13;<br \/>\n95                91              2              4              1&#13;<br \/>\n96               884              0              4              1&#13;<br \/>\n97              1737              0              5              1&#13;<br \/>\n98               837              0              5              1&#13;<br \/>\n99               862              1              4              1&#13;<br \/>\n&#13;<br \/>\n[100 rows x 4 columns]<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\" style=\"\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\" style=\"font-size: 12px !important; line-height: 15px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;\">\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-e\">account_age_days\u00a0\u00a0<\/span><span class=\"crayon-e\">prior_tickets\u00a0\u00a0<\/span><span class=\"crayon-e\">urgency_score\u00a0\u00a0<\/span><span class=\"crayon-v\">is<\/span><span class=\"crayon-sy\">_<\/span>frustrated<\/p>\n<p><span class=\"crayon-cn\">0<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">564<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1<\/span><\/p>\n<p><span class=\"crayon-cn\">1<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">1517<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">3<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">4<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0<\/span><\/p>\n<p><span class=\"crayon-cn\">2<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">62<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1<\/span><\/p>\n<p><span class=\"crayon-cn\">3<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">408<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">4<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0<\/span><\/p>\n<p><span class=\"crayon-cn\">4<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">920<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1<\/span><\/p>\n<p><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-sy\">.<\/span><\/p>\n<p><span class=\"crayon-cn\">95<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">91<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">2<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">4<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1<\/span><\/p>\n<p><span class=\"crayon-cn\">96<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">884<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">4<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1<\/span><\/p>\n<p><span class=\"crayon-cn\">97<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1737<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1<\/span><\/p>\n<p><span class=\"crayon-cn\">98<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">837<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">5<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1<\/span><\/p>\n<p><span class=\"crayon-cn\">99<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">862<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">4<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-sy\">[<\/span><span class=\"crayon-cn\">100<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">rows<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-i\">x<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-cn\">4<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">columns<\/span><span class=\"crayon-sy\">]<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p><strong>Practical note on cost and latency<\/strong>: Calling an LLM once per row can become slow and expensive on larger datasets. In production, you will usually want to (1) <em>batch<\/em> requests (process many tickets per call, if your provider and prompt design allow it), (2) <em>cache<\/em> results keyed by a stable identifier (or a hash of the ticket text) so re-runs do not re-bill the same examples, and (3) implement <em>retries with backoff<\/em> to handle transient rate limits and network errors. These three practices typically make the pipeline faster, cheaper, and far more reliable.<\/p>\n<h2>Training and Evaluating the Model<\/h2>\n<p>Finally, here comes the machine learning pipeline, where the updated, fully tabular dataset is scaled, split into training and test subsets, and used to train and evaluate a random forest classifier.<\/p>\n<div id=\"urvanov-syntax-highlighter-69b06e58d103f287193341\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\" style=\" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;\">\n<p><textarea wrap=\"soft\" class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly=\"readonly\" style=\"-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;\"><br \/>\nprint(&#8220;\\n3. Scaling and Training Random Forest&#8230;&#8221;)&#13;<br \/>\nscaler = StandardScaler()&#13;<br \/>\nX_scaled = scaler.fit_transform(X_raw)&#13;<br \/>\n&#13;<br \/>\n# Split the data into training and test&#13;<br \/>\nX_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.4, random_state=42)&#13;<br \/>\n&#13;<br \/>\n# Train a random forest classification model&#13;<br \/>\nclf = RandomForestClassifier(random_state=42)&#13;<br \/>\nclf.fit(X_train, y_train)&#13;<br \/>\n&#13;<br \/>\n# Predict and Evaluate&#13;<br \/>\ny_pred = clf.predict(X_test)&#13;<br \/>\nprint(&#8220;\\n4. Classification Report:&#8221;)&#13;<br \/>\nprint(classification_report(y_test, y_pred, zero_division=0))<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\" style=\"\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\" style=\"font-size: 12px !important; line-height: 15px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;\">\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8220;\\n3. Scaling and Training Random Forest&#8230;&#8221;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">scaler<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">StandardScaler<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">X_scaled<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">scaler<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit_transform<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_raw<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># Split the data into training and test<\/span><\/p>\n<p><span class=\"crayon-v\">X_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">X_test<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_test<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">train_test_split<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_scaled<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">test_size<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">0.4<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">42<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># Train a random forest classification model<\/span><\/p>\n<p><span class=\"crayon-v\">clf<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-e\">RandomForestClassifier<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">random_state<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">42<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-v\">clf<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">fit<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_train<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_train<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-p\"># Predict and Evaluate<\/span><\/p>\n<p><span class=\"crayon-v\">y_pred<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">clf<\/span><span class=\"crayon-sy\">.<\/span><span class=\"crayon-e\">predict<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">X_test<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-s\">&#8220;\\n4. Classification Report:&#8221;<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<p><span class=\"crayon-e\">print<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-e\">classification_report<\/span><span class=\"crayon-sy\">(<\/span><span class=\"crayon-v\">y_test<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">y_pred<\/span><span class=\"crayon-sy\">,<\/span><span class=\"crayon-h\"> <\/span><span class=\"crayon-v\">zero_division<\/span><span class=\"crayon-o\">=<\/span><span class=\"crayon-cn\">0<\/span><span class=\"crayon-sy\">)<\/span><span class=\"crayon-sy\">)<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>Here are the classifier results:<\/p>\n<div id=\"urvanov-syntax-highlighter-69b06e58d1043104211591\" class=\"urvanov-syntax-highlighter-syntax crayon-theme-classic urvanov-syntax-highlighter-font-monaco urvanov-syntax-highlighter-os-pc print-yes notranslate\" data-settings=\" minimize scroll-mouseover disable-anim\" style=\" margin-top: 12px; margin-bottom: 12px; font-size: 12px !important; line-height: 15px !important;\">\n<p><textarea wrap=\"soft\" class=\"urvanov-syntax-highlighter-plain print-no\" data-settings=\"dblclick\" readonly=\"readonly\" style=\"-moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4; font-size: 12px !important; line-height: 15px !important;\"><br \/>\nClassification Report:&#13;<br \/>\n              precision    recall  f1-score   support&#13;<br \/>\n&#13;<br \/>\n      access       0.22      0.18      0.20        11&#13;<br \/>\n     billing       0.29      0.33      0.31         6&#13;<br \/>\n    hardware       0.29      0.25      0.27         8&#13;<br \/>\n     inquiry       1.00      1.00      1.00         8&#13;<br \/>\n    software       0.44      0.57      0.50         7&#13;<br \/>\n&#13;<br \/>\n    accuracy                           0.45        40&#13;<br \/>\n   macro avg       0.45      0.47      0.45        40&#13;<br \/>\nweighted avg       0.44      0.45      0.44        40<\/textarea><\/p>\n<div class=\"urvanov-syntax-highlighter-main\" style=\"\">\n<table class=\"crayon-table\">\n<tr class=\"urvanov-syntax-highlighter-row\">\n<td class=\"crayon-nums \" data-settings=\"show\">\n<\/td>\n<td class=\"urvanov-syntax-highlighter-code\">\n<div class=\"crayon-pre\" style=\"font-size: 12px !important; line-height: 15px !important; -moz-tab-size:4; -o-tab-size:4; -webkit-tab-size:4; tab-size:4;\">\n<p><span class=\"crayon-e\">Classification <\/span><span class=\"crayon-v\">Report<\/span><span class=\"crayon-o\">:<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-e\">precision\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-e\">recall\u00a0\u00a0<\/span><span class=\"crayon-v\">f1<\/span><span class=\"crayon-o\">&#8211;<\/span><span class=\"crayon-e\">score\u00a0\u00a0 <\/span><span class=\"crayon-e\">support<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-e\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-i\">access<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">0.22<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.18<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.20<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">11<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-i\">billing<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">0.29<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.33<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.31<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">6<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-i\">hardware<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">0.29<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.25<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.27<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">8<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-i\">inquiry<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">1.00<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1.00<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">1.00<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">8<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-i\">software<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">0.44<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.57<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.50<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">7<\/span><\/p>\n<p>\u00a0<\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-i\">accuracy<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">0.45<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">40<\/span><\/p>\n<p><span class=\"crayon-h\">\u00a0\u00a0 <\/span><span class=\"crayon-e\">macro <\/span><span class=\"crayon-i\">avg<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">0.45<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.47<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.45<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">40<\/span><\/p>\n<p><span class=\"crayon-e\">weighted <\/span><span class=\"crayon-i\">avg<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span class=\"crayon-cn\">0.44<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.45<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">0.44<\/span><span class=\"crayon-h\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/span><span class=\"crayon-cn\">40<\/span><\/p>\n<\/div>\n<\/td>\n<\/tr>\n<\/table><\/div>\n<\/p><\/div>\n<p>If you used the code for generating a synthetic toy dataset, you may get a rather disappointing classifier result in terms of accuracy, precision, recall, and so on. This is normal: for the sake of efficiency and simplicity, we used a small, partly random set of 100 instances \u2014 which is typically too small (and arguably too random) to perform well. The key here is the process of turning raw text into meaningful features through the use of a pre-trained LLM via API, which should work reliably.<\/p>\n<h2>Summary<\/h2>\n<p>This article takes a gentle tour through the process of turning raw text into fully tabular features for downstream machine learning modeling. The key trick shown along the way is using a pre-trained LLM to perform inference and return structured outputs via effective prompting.<\/p>\n<\/p><\/div>\n","protected":false},"excerpt":{"rendered":"<p>In this article, you will learn how to use a pre-trained large language model to extract structured features from text and combine them with numeric columns to train a supervised classifier. Topics we will cover include: Creating a toy dataset with mixed text and numeric fields for classification Using a Groq-hosted LLaMA model to extract<\/p>\n","protected":false},"author":1,"featured_media":178874,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[42],"tags":[],"class_list":{"0":"post-178873","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-ai"},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>From Text to Tables: Feature Engineering with LLMs for Tabular Data - Ktromedia<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ktromedia.com\/?p=178873\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"From Text to Tables: Feature Engineering with LLMs for Tabular Data - Ktromedia\" \/>\n<meta property=\"og:description\" content=\"In this article, you will learn how to use a pre-trained large language model to extract structured features from text and combine them with numeric columns to train a supervised classifier. Topics we will cover include: Creating a toy dataset with mixed text and numeric fields for classification Using a Groq-hosted LLaMA model to extract\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ktromedia.com\/?p=178873\" \/>\n<meta property=\"og:site_name\" content=\"Ktromedia\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/KTROMedia\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-03-13T16:10:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ktromedia.com\/wp-content\/uploads\/2026\/03\/1773418212_From-Text-to-Tables-Feature-Engineering-with-LLMs-for-Tabular.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1024\" \/>\n\t<meta property=\"og:image:height\" content=\"571\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"KTRO TEAM\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"KTRO TEAM\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/ktromedia.com\/?p=178873#article\",\"isPartOf\":{\"@id\":\"https:\/\/ktromedia.com\/?p=178873\"},\"author\":{\"name\":\"KTRO TEAM\",\"@id\":\"https:\/\/ktromedia.com\/#\/schema\/person\/612bf2fbac107722ea365932cdd35f5b\"},\"headline\":\"From Text to Tables: Feature Engineering with LLMs for Tabular Data\",\"datePublished\":\"2026-03-13T16:10:13+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/ktromedia.com\/?p=178873\"},\"wordCount\":2106,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/ktromedia.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/ktromedia.com\/?p=178873#primaryimage\"},\"thumbnailUrl\":\"https:\/\/ktromedia.com\/wp-content\/uploads\/2026\/03\/1773418212_From-Text-to-Tables-Feature-Engineering-with-LLMs-for-Tabular.png\",\"articleSection\":[\"\u4eba\u5de5\u667a\u80fd\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/ktromedia.com\/?p=178873#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ktromedia.com\/?p=178873\",\"url\":\"https:\/\/ktromedia.com\/?p=178873\",\"name\":\"From Text to Tables: Feature Engineering with LLMs for Tabular Data - Ktromedia\",\"isPartOf\":{\"@id\":\"https:\/\/ktromedia.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/ktromedia.com\/?p=178873#primaryimage\"},\"image\":{\"@id\":\"https:\/\/ktromedia.com\/?p=178873#primaryimage\"},\"thumbnailUrl\":\"https:\/\/ktromedia.com\/wp-content\/uploads\/2026\/03\/1773418212_From-Text-to-Tables-Feature-Engineering-with-LLMs-for-Tabular.png\",\"datePublished\":\"2026-03-13T16:10:13+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/ktromedia.com\/?p=178873#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ktromedia.com\/?p=178873\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/ktromedia.com\/?p=178873#primaryimage\",\"url\":\"https:\/\/ktromedia.com\/wp-content\/uploads\/2026\/03\/1773418212_From-Text-to-Tables-Feature-Engineering-with-LLMs-for-Tabular.png\",\"contentUrl\":\"https:\/\/ktromedia.com\/wp-content\/uploads\/2026\/03\/1773418212_From-Text-to-Tables-Feature-Engineering-with-LLMs-for-Tabular.png\",\"width\":1024,\"height\":571},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ktromedia.com\/?p=178873#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ktromedia.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"From Text to Tables: Feature Engineering with LLMs for Tabular Data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ktromedia.com\/#website\",\"url\":\"https:\/\/ktromedia.com\/\",\"name\":\"Ktromedia\",\"description\":\"KTRO MEDIA Crypto News\",\"publisher\":{\"@id\":\"https:\/\/ktromedia.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ktromedia.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/ktromedia.com\/#organization\",\"name\":\"Ktromedia\",\"url\":\"https:\/\/ktromedia.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/ktromedia.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/ktromedia.com\/wp-content\/uploads\/2025\/11\/ktroicon.png\",\"contentUrl\":\"https:\/\/ktromedia.com\/wp-content\/uploads\/2025\/11\/ktroicon.png\",\"width\":250,\"height\":250,\"caption\":\"Ktromedia\"},\"image\":{\"@id\":\"https:\/\/ktromedia.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/KTROMedia\/\",\"https:\/\/www.linkedin.com\/company\/ktro-media\/\",\"https:\/\/t.me\/ktrogroup\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/ktromedia.com\/#\/schema\/person\/612bf2fbac107722ea365932cdd35f5b\",\"name\":\"KTRO TEAM\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/ktromedia.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/ktromedia.com\/wp-content\/uploads\/2025\/10\/cropped-Untitled-design-7-1-150x150.png\",\"contentUrl\":\"https:\/\/ktromedia.com\/wp-content\/uploads\/2025\/10\/cropped-Untitled-design-7-1-150x150.png\",\"caption\":\"KTRO TEAM\"},\"description\":\"KTRO MEDIA \u662f\u4e00\u5bb6\u5168\u7403\u6027\u7684\u534e\u6587WEB3\u5a92\u4f53\u516c\u53f8\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u4e3a\u533a\u5757\u94fe\u548c\u91d1\u878d\u79d1\u6280\u9886\u57df\u63d0\u4f9b\u6700\u65b0\u7684\u65b0\u95fb\u3001\u89c1\u89e3\u548c\u8d8b\u52bf\u5206\u6790\u3002\u6211\u4eec\u7684\u5b97\u65e8\u662f\u4e3a\u5168\u7403\u7528\u6237\u63d0\u4f9b\u9ad8\u8d28\u91cf\u3001\u5168\u9762\u7684\u8d44\u8baf\u670d\u52a1\uff0c\u8ba9\u4ed6\u4eec\u66f4\u597d\u5730\u4e86\u89e3\u533a\u5757\u94fe\u548c\u91d1\u878d\u79d1\u6280\u884c\u4e1a\u7684\u6700\u65b0\u52a8\u6001\u3002\u6211\u4eec\u4e5f\u5e0c\u671b\u80fd\u5e2e\u5230\u66f4\u591a\u4f18\u79c0\u7684WEB3\u4ea7\u54c1\u627e\u5230\u66f4\u591a\u66f4\u597d\u7684\u8d44\u6e90\u597d\u8ba9\u8fd9\u9886\u57df\u53d8\u5f97\u66f4\u6210\u719f\u3002 \u6211\u4eec\u7684\u62a5\u9053\u8303\u56f4\u6db5\u76d6\u4e86\u533a\u5757\u94fe\u3001\u52a0\u5bc6\u8d27\u5e01\u3001\u667a\u80fd\u5408\u7ea6\u3001DeFi\u3001NFT \u548c Web3 \u751f\u6001\u7cfb\u7edf\u7b49\u9886\u57df\u3002\u6211\u4eec\u7684\u62a5\u9053\u4e0d\u4ec5\u6765\u81ea\u884c\u4e1a\u5185\u7684\u4e13\u5bb6\uff0c\u5148\u950b\u8005\u4e5f\u5305\u62ec\u4e86\u6211\u4eec\u81ea\u5df1\u7684\u5206\u6790\u548c\u89c2\u70b9\u3002\u6211\u4eec\u5728\u5404\u4e2a\u56fd\u5bb6\u548c\u5730\u533a\u90fd\u8bbe\u6709\u56e2\u961f\uff0c\u4e3a\u8bfb\u8005\u63d0\u4f9b\u672c\u5730\u5316\u7684\u62a5\u9053\u548c\u5206\u6790\u3002 \u9664\u4e86\u65b0\u95fb\u62a5\u9053\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u5e02\u573a\u7814\u7a76\u548c\u54a8\u8be2\u670d\u52a1\u3002\u6211\u4eec\u7684\u4e13\u4e1a\u56e2\u961f\u53ef\u4ee5\u4e3a\u60a8\u63d0\u4f9b\u6709\u5173\u533a\u5757\u94fe\u548c\u91d1\u878d\u79d1\u6280\u884c\u4e1a\u7684\u6df1\u5165\u5206\u6790\u548c\u5e02\u573a\u8d8b\u52bf\uff0c\u5e2e\u52a9\u60a8\u505a\u51fa\u66f4\u660e\u667a\u7684\u6295\u8d44\u51b3\u7b56\u3002 \u6211\u4eec\u7684\u4f7f\u547d\u662f\u6210\u4e3a\u5168\u7403\u534e\u6587\u533a\u5757\u94fe\u548c\u91d1\u878d\u79d1\u6280\u884c\u4e1a\u6700\u53d7\u4fe1\u8d56\u7684\u4fe1\u606f\u6765\u6e90\u4e4b\u4e00\u3002\u6211\u4eec\u5c06\u7ee7\u7eed\u4e0d\u65ad\u52aa\u529b\uff0c\u4e3a\u8bfb\u8005\u63d0\u4f9b\u6700\u65b0\u3001\u6700\u5168\u9762\u3001\u6700\u53ef\u9760\u7684\u4fe1\u606f\u670d\u52a1\u3002\",\"sameAs\":[\"https:\/\/ktromedia.com\"],\"url\":\"https:\/\/ktromedia.com\/?author=1\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"From Text to Tables: Feature Engineering with LLMs for Tabular Data - Ktromedia","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ktromedia.com\/?p=178873","og_locale":"en_US","og_type":"article","og_title":"From Text to Tables: Feature Engineering with LLMs for Tabular Data - Ktromedia","og_description":"In this article, you will learn how to use a pre-trained large language model to extract structured features from text and combine them with numeric columns to train a supervised classifier. Topics we will cover include: Creating a toy dataset with mixed text and numeric fields for classification Using a Groq-hosted LLaMA model to extract","og_url":"https:\/\/ktromedia.com\/?p=178873","og_site_name":"Ktromedia","article_publisher":"https:\/\/www.facebook.com\/KTROMedia\/","article_published_time":"2026-03-13T16:10:13+00:00","og_image":[{"width":1024,"height":571,"url":"https:\/\/ktromedia.com\/wp-content\/uploads\/2026\/03\/1773418212_From-Text-to-Tables-Feature-Engineering-with-LLMs-for-Tabular.png","type":"image\/png"}],"author":"KTRO TEAM","twitter_card":"summary_large_image","twitter_misc":{"Written by":"KTRO TEAM","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ktromedia.com\/?p=178873#article","isPartOf":{"@id":"https:\/\/ktromedia.com\/?p=178873"},"author":{"name":"KTRO TEAM","@id":"https:\/\/ktromedia.com\/#\/schema\/person\/612bf2fbac107722ea365932cdd35f5b"},"headline":"From Text to Tables: Feature Engineering with LLMs for Tabular Data","datePublished":"2026-03-13T16:10:13+00:00","mainEntityOfPage":{"@id":"https:\/\/ktromedia.com\/?p=178873"},"wordCount":2106,"commentCount":0,"publisher":{"@id":"https:\/\/ktromedia.com\/#organization"},"image":{"@id":"https:\/\/ktromedia.com\/?p=178873#primaryimage"},"thumbnailUrl":"https:\/\/ktromedia.com\/wp-content\/uploads\/2026\/03\/1773418212_From-Text-to-Tables-Feature-Engineering-with-LLMs-for-Tabular.png","articleSection":["\u4eba\u5de5\u667a\u80fd"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ktromedia.com\/?p=178873#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ktromedia.com\/?p=178873","url":"https:\/\/ktromedia.com\/?p=178873","name":"From Text to Tables: Feature Engineering with LLMs for Tabular Data - Ktromedia","isPartOf":{"@id":"https:\/\/ktromedia.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ktromedia.com\/?p=178873#primaryimage"},"image":{"@id":"https:\/\/ktromedia.com\/?p=178873#primaryimage"},"thumbnailUrl":"https:\/\/ktromedia.com\/wp-content\/uploads\/2026\/03\/1773418212_From-Text-to-Tables-Feature-Engineering-with-LLMs-for-Tabular.png","datePublished":"2026-03-13T16:10:13+00:00","breadcrumb":{"@id":"https:\/\/ktromedia.com\/?p=178873#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ktromedia.com\/?p=178873"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ktromedia.com\/?p=178873#primaryimage","url":"https:\/\/ktromedia.com\/wp-content\/uploads\/2026\/03\/1773418212_From-Text-to-Tables-Feature-Engineering-with-LLMs-for-Tabular.png","contentUrl":"https:\/\/ktromedia.com\/wp-content\/uploads\/2026\/03\/1773418212_From-Text-to-Tables-Feature-Engineering-with-LLMs-for-Tabular.png","width":1024,"height":571},{"@type":"BreadcrumbList","@id":"https:\/\/ktromedia.com\/?p=178873#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ktromedia.com\/"},{"@type":"ListItem","position":2,"name":"From Text to Tables: Feature Engineering with LLMs for Tabular Data"}]},{"@type":"WebSite","@id":"https:\/\/ktromedia.com\/#website","url":"https:\/\/ktromedia.com\/","name":"Ktromedia","description":"KTRO MEDIA Crypto News","publisher":{"@id":"https:\/\/ktromedia.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ktromedia.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ktromedia.com\/#organization","name":"Ktromedia","url":"https:\/\/ktromedia.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ktromedia.com\/#\/schema\/logo\/image\/","url":"https:\/\/ktromedia.com\/wp-content\/uploads\/2025\/11\/ktroicon.png","contentUrl":"https:\/\/ktromedia.com\/wp-content\/uploads\/2025\/11\/ktroicon.png","width":250,"height":250,"caption":"Ktromedia"},"image":{"@id":"https:\/\/ktromedia.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/KTROMedia\/","https:\/\/www.linkedin.com\/company\/ktro-media\/","https:\/\/t.me\/ktrogroup"]},{"@type":"Person","@id":"https:\/\/ktromedia.com\/#\/schema\/person\/612bf2fbac107722ea365932cdd35f5b","name":"KTRO TEAM","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ktromedia.com\/#\/schema\/person\/image\/","url":"https:\/\/ktromedia.com\/wp-content\/uploads\/2025\/10\/cropped-Untitled-design-7-1-150x150.png","contentUrl":"https:\/\/ktromedia.com\/wp-content\/uploads\/2025\/10\/cropped-Untitled-design-7-1-150x150.png","caption":"KTRO TEAM"},"description":"KTRO MEDIA \u662f\u4e00\u5bb6\u5168\u7403\u6027\u7684\u534e\u6587WEB3\u5a92\u4f53\u516c\u53f8\u3002\u6211\u4eec\u81f4\u529b\u4e8e\u4e3a\u533a\u5757\u94fe\u548c\u91d1\u878d\u79d1\u6280\u9886\u57df\u63d0\u4f9b\u6700\u65b0\u7684\u65b0\u95fb\u3001\u89c1\u89e3\u548c\u8d8b\u52bf\u5206\u6790\u3002\u6211\u4eec\u7684\u5b97\u65e8\u662f\u4e3a\u5168\u7403\u7528\u6237\u63d0\u4f9b\u9ad8\u8d28\u91cf\u3001\u5168\u9762\u7684\u8d44\u8baf\u670d\u52a1\uff0c\u8ba9\u4ed6\u4eec\u66f4\u597d\u5730\u4e86\u89e3\u533a\u5757\u94fe\u548c\u91d1\u878d\u79d1\u6280\u884c\u4e1a\u7684\u6700\u65b0\u52a8\u6001\u3002\u6211\u4eec\u4e5f\u5e0c\u671b\u80fd\u5e2e\u5230\u66f4\u591a\u4f18\u79c0\u7684WEB3\u4ea7\u54c1\u627e\u5230\u66f4\u591a\u66f4\u597d\u7684\u8d44\u6e90\u597d\u8ba9\u8fd9\u9886\u57df\u53d8\u5f97\u66f4\u6210\u719f\u3002 \u6211\u4eec\u7684\u62a5\u9053\u8303\u56f4\u6db5\u76d6\u4e86\u533a\u5757\u94fe\u3001\u52a0\u5bc6\u8d27\u5e01\u3001\u667a\u80fd\u5408\u7ea6\u3001DeFi\u3001NFT \u548c Web3 \u751f\u6001\u7cfb\u7edf\u7b49\u9886\u57df\u3002\u6211\u4eec\u7684\u62a5\u9053\u4e0d\u4ec5\u6765\u81ea\u884c\u4e1a\u5185\u7684\u4e13\u5bb6\uff0c\u5148\u950b\u8005\u4e5f\u5305\u62ec\u4e86\u6211\u4eec\u81ea\u5df1\u7684\u5206\u6790\u548c\u89c2\u70b9\u3002\u6211\u4eec\u5728\u5404\u4e2a\u56fd\u5bb6\u548c\u5730\u533a\u90fd\u8bbe\u6709\u56e2\u961f\uff0c\u4e3a\u8bfb\u8005\u63d0\u4f9b\u672c\u5730\u5316\u7684\u62a5\u9053\u548c\u5206\u6790\u3002 \u9664\u4e86\u65b0\u95fb\u62a5\u9053\uff0c\u6211\u4eec\u8fd8\u63d0\u4f9b\u5e02\u573a\u7814\u7a76\u548c\u54a8\u8be2\u670d\u52a1\u3002\u6211\u4eec\u7684\u4e13\u4e1a\u56e2\u961f\u53ef\u4ee5\u4e3a\u60a8\u63d0\u4f9b\u6709\u5173\u533a\u5757\u94fe\u548c\u91d1\u878d\u79d1\u6280\u884c\u4e1a\u7684\u6df1\u5165\u5206\u6790\u548c\u5e02\u573a\u8d8b\u52bf\uff0c\u5e2e\u52a9\u60a8\u505a\u51fa\u66f4\u660e\u667a\u7684\u6295\u8d44\u51b3\u7b56\u3002 \u6211\u4eec\u7684\u4f7f\u547d\u662f\u6210\u4e3a\u5168\u7403\u534e\u6587\u533a\u5757\u94fe\u548c\u91d1\u878d\u79d1\u6280\u884c\u4e1a\u6700\u53d7\u4fe1\u8d56\u7684\u4fe1\u606f\u6765\u6e90\u4e4b\u4e00\u3002\u6211\u4eec\u5c06\u7ee7\u7eed\u4e0d\u65ad\u52aa\u529b\uff0c\u4e3a\u8bfb\u8005\u63d0\u4f9b\u6700\u65b0\u3001\u6700\u5168\u9762\u3001\u6700\u53ef\u9760\u7684\u4fe1\u606f\u670d\u52a1\u3002","sameAs":["https:\/\/ktromedia.com"],"url":"https:\/\/ktromedia.com\/?author=1"}]}},"_links":{"self":[{"href":"https:\/\/ktromedia.com\/index.php?rest_route=\/wp\/v2\/posts\/178873","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ktromedia.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ktromedia.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ktromedia.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ktromedia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=178873"}],"version-history":[{"count":1,"href":"https:\/\/ktromedia.com\/index.php?rest_route=\/wp\/v2\/posts\/178873\/revisions"}],"predecessor-version":[{"id":178875,"href":"https:\/\/ktromedia.com\/index.php?rest_route=\/wp\/v2\/posts\/178873\/revisions\/178875"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ktromedia.com\/index.php?rest_route=\/wp\/v2\/media\/178874"}],"wp:attachment":[{"href":"https:\/\/ktromedia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=178873"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ktromedia.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=178873"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ktromedia.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=178873"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}