<a target="_blank" href="https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/esql/esql-getting-started.ipynb"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Getting started with ES|QL

In this notebook you'll learn the basics of the Elasticsearch Query Language (ES|QL).
You'll be using the official [Elasticsearch Python client](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html).

You'll learn how to:

- Run an ES|QL query
- Use processing commands
- Sort a table
- Query data
- Chain processing commands
- Compute values
- Calculate statistics
- Access columns
- Create a histogram
- Enrich data
- Process data

> ℹ️ ES|QL is generally available as of Elastic stack version **8.14.0**.

## Create Elastic Cloud deployment

If you don't have an Elastic Cloud deployment, sign up [here](https://cloud.elastic.co/registration?onboarding_token=search&utm_source=github&utm_content=elasticsearch-labs-notebook) for a free trial.

Once logged in to your Elastic Cloud account, go to the [Create deployment](https://cloud.elastic.co/deployments/create) page and select **Create deployment**. Leave all settings with their default values.

## Install packages and import modules

To get started, we'll need to connect to our Elastic deployment using the Python client. Because we're using an Elastic Cloud deployment, we'll use the **Cloud ID** to identify our deployment.

First we need to install the `elasticsearch` Python client.

In [1]:
# Install packages

!pip install elasticsearch

Collecting elasticsearch
  Downloading elasticsearch-8.14.0-py3-none-any.whl (480 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.2/480.2 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting elastic-transport<9,>=8.13 (from elasticsearch)
  Downloading elastic_transport-8.13.1-py3-none-any.whl (64 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.5/64.5 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: elastic-transport, elasticsearch
Successfully installed elastic-transport-8.13.1 elasticsearch-8.14.0


In [2]:
# Import packages

from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
from getpass import getpass  # standard library module for secure credential input

## Initialize the Elasticsearch client

Now we can instantiate the Elasticsearch Python client, providing the CLOUD ID and [API key](https://www.elastic.co/guide/en/kibana/current/api-keys.html#create-api-key) for your deployment.

> ℹ️ If you're running Elasticsearch locally or on self-managed infrastructure, you'll need to pass in the Elasticsearch host instead. [Read the docs](https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#_verifying_https_with_certificate_fingerprints_python_3_10_or_later) about how to connect to Elasticsearch locally.

In [3]:
# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id
ELASTIC_CLOUD_ID = getpass("Elastic Cloud ID: ")

# https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#creating-an-api-key
ELASTIC_API_KEY = getpass("Elastic API Key: ")

# Create the client instance
client = Elasticsearch(
    # For local development
    # hosts=["http://localhost:9200"]
    cloud_id=ELASTIC_CLOUD_ID,
    api_key=ELASTIC_API_KEY,
)

Elastic Cloud ID: ··········
Elastic API Key: ··········


## Add sample data to Elasticsearch

Before we index our sample dataset, let's create an index named `sample_data` with the correct [mappings](https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html).

In [4]:
index_name = "sample_data"

mappings = {
    "mappings": {
        "properties": {"client_ip": {"type": "ip"}, "message": {"type": "keyword"}}
    }
}

# Create the index
if not client.indices.exists(index=index_name):
    client.indices.create(index=index_name, body=mappings)

Now we can index the data using the Elasticsearch Python client's [bulk helpers](https://elasticsearch-py.readthedocs.io/en/latest/helpers.html#bulk-helpers).

In [5]:
# Documents to be indexed
documents = [
    {
        "@timestamp": "2023-10-23T12:15:03.360Z",
        "client_ip": "172.21.2.162",
        "message": "Connected to 10.1.0.3",
        "event_duration": 3450233,
    },
    {
        "@timestamp": "2023-10-23T12:27:28.948Z",
        "client_ip": "172.21.2.113",
        "message": "Connected to 10.1.0.2",
        "event_duration": 2764889,
    },
    {
        "@timestamp": "2023-10-23T13:33:34.937Z",
        "client_ip": "172.21.0.5",
        "message": "Disconnected",
        "event_duration": 1232382,
    },
    {
        "@timestamp": "2023-10-23T13:51:54.732Z",
        "client_ip": "172.21.3.15",
        "message": "Connection error",
        "event_duration": 725448,
    },
    {
        "@timestamp": "2023-10-23T13:52:55.015Z",
        "client_ip": "172.21.3.15",
        "message": "Connection error",
        "event_duration": 8268153,
    },
    {
        "@timestamp": "2023-10-23T13:53:55.832Z",
        "client_ip": "172.21.3.15",
        "message": "Connection error",
        "event_duration": 5033755,
    },
    {
        "@timestamp": "2023-10-23T13:55:01.543Z",
        "client_ip": "172.21.3.15",
        "message": "Connected to 10.1.0.1",
        "event_duration": 1756467,
    },
]

# Prepare the actions for the bulk API using list comprehension
actions = [{"_index": index_name, "_source": doc} for doc in documents]

# Perform the bulk index operation and capture the response
success, failed = bulk(client, actions)

if failed:
    print(f"Some documents failed to index: {failed}")
else:
    print(f"Successfully indexed {success} documents.")

Successfully indexed 7 documents.


In [6]:
# Suppress specific Elasticsearch warnings about default limit of [500] that pollute responses

import warnings
from elasticsearch import ElasticsearchWarning

warnings.filterwarnings("ignore", category=ElasticsearchWarning)

In [7]:
# Format response to return human-readable tables


def format_response(response_data):
    column_names = [col["name"] for col in response_data["columns"]]
    column_widths = [
        max(
            len(name),
            max(
                (
                    len(str(row[i]) if row[i] is not None else "None")
                    for row in response_data["values"]
                ),
                default=0,
            ),
        )
        for i, name in enumerate(column_names)
    ]
    row_format = " | ".join(["{:<" + str(width) + "}" for width in column_widths])
    print(row_format.format(*column_names))
    print("-" * sum(column_widths) + "-" * (len(column_widths) - 1) * 3)
    for row in response_data["values"]:
        # Convert None values in the row to "None" before formatting
        formatted_row = [(str(cell) if cell is not None else "None") for cell in row]
        print(row_format.format(*formatted_row))

## Your first ES|QL query

Each ES|QL query starts with a [source command](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-source-commands "Source commands"). A source command produces a table, typically with data from Elasticsearch.

The [`FROM`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-from "FROM") source command returns a table with documents from a data stream, index, or alias. Each row in the resulting table represents a document. This query returns up to 500 documents from the `sample_data` index:

In [8]:
esql_query = "FROM sample_data"

response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | client_ip    | event_duration | message              
--------------------------------------------------------------------------------
2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233        | Connected to 10.1.0.3
2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889        | Connected to 10.1.0.2
2023-10-23T13:33:34.937Z | 172.21.0.5   | 1232382        | Disconnected         
2023-10-23T13:51:54.732Z | 172.21.3.15  | 725448         | Connection error     
2023-10-23T13:52:55.015Z | 172.21.3.15  | 8268153        | Connection error     
2023-10-23T13:53:55.832Z | 172.21.3.15  | 5033755        | Connection error     
2023-10-23T13:55:01.543Z | 172.21.3.15  | 1756467        | Connected to 10.1.0.1


Each column corresponds to a field, and can be accessed by the name of that field.

ℹ️ ES|QL keywords are case-insensitive. `FROM sample_data` is identical to `from sample_data`.

## Processing commands

A source command can be followed by one or more [processing commands](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-processing-commands), separated by a pipe character: `|`. Processing commands change an input table by adding, removing, or changing rows and columns. Processing commands can perform filtering, projection, aggregation, and more.

For example, you can use the `LIMIT` command to limit the number of rows that are returned, up to a maximum of 10,000 rows:

In [9]:
esql_query = """
FROM sample_data
| LIMIT 3
"""

response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | client_ip    | event_duration | message              
--------------------------------------------------------------------------------
2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233        | Connected to 10.1.0.3
2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889        | Connected to 10.1.0.2
2023-10-23T13:33:34.937Z | 172.21.0.5   | 1232382        | Disconnected         


### Sort a table

Another processing command is the [`SORT`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-sort) command. By default, the rows returned by `FROM` don’t have a defined sort order. Use the `SORT` command to sort rows on one or more columns:

In [10]:
esql_query = """
FROM sample_data
| SORT @timestamp DESC
"""

response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | client_ip    | event_duration | message              
--------------------------------------------------------------------------------
2023-10-23T13:55:01.543Z | 172.21.3.15  | 1756467        | Connected to 10.1.0.1
2023-10-23T13:53:55.832Z | 172.21.3.15  | 5033755        | Connection error     
2023-10-23T13:52:55.015Z | 172.21.3.15  | 8268153        | Connection error     
2023-10-23T13:51:54.732Z | 172.21.3.15  | 725448         | Connection error     
2023-10-23T13:33:34.937Z | 172.21.0.5   | 1232382        | Disconnected         
2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889        | Connected to 10.1.0.2
2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233        | Connected to 10.1.0.3


### Query the data

Use the [`WHERE`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-where) command to query the data. For example, to find all events with a duration longer than 5ms:

In [11]:
esql_query = """
FROM sample_data
| WHERE event_duration > 5000000
"""

response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | client_ip   | event_duration | message         
--------------------------------------------------------------------------
2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153        | Connection error
2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755        | Connection error


`WHERE` supports several [operators](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-functions-operators.html#esql-operators).

For example, you can use [`LIKE`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-functions-operators.html#esql-like-operator) to run a wildcard query against the message column:

In [12]:
esql_query = """
FROM sample_data
| WHERE message LIKE "Connected*"
"""

response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | client_ip    | event_duration | message              
--------------------------------------------------------------------------------
2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233        | Connected to 10.1.0.3
2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889        | Connected to 10.1.0.2
2023-10-23T13:55:01.543Z | 172.21.3.15  | 1756467        | Connected to 10.1.0.1


### More processing commands

There are many other processing commands, like [`KEEP`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-keep "KEEP") and [`DROP`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-drop "DROP") to keep or drop columns, [`ENRICH`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-enrich "ENRICH") to enrich a table with data from indices in Elasticsearch, and [`DISSECT`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-dissect "DISSECT") and [`GROK`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-grok "GROK") to process data. Refer to [Processing commands](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-processing-commands "Processing commands") for an overview.

## Chain processing commands

You can chain processing commands, separated by a pipe character: `|`. Each
processing command works on the output table of the previous command. The result
of a query is the table produced by the final processing command.

The following example first sorts the table on `@timestamp`, and next limits the
result set to 3 rows:

In [13]:
esql_query = """
FROM sample_data
| SORT @timestamp DESC
| LIMIT 3
"""

response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | client_ip   | event_duration | message              
-------------------------------------------------------------------------------
2023-10-23T13:55:01.543Z | 172.21.3.15 | 1756467        | Connected to 10.1.0.1
2023-10-23T13:53:55.832Z | 172.21.3.15 | 5033755        | Connection error     
2023-10-23T13:52:55.015Z | 172.21.3.15 | 8268153        | Connection error     


> ℹ️ The order of processing commands is important. First limiting the result set to 3 rows before sorting those 3 rows would most likely return a result that is different than this example, where the sorting comes before the limit.

## Compute values

Use the [`EVAL`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-eval "EVAL") command to append columns to a table, with calculated values. For example, the following query appends a `duration_ms` column. The values in the column are computed by dividing `event_duration` by 1,000,000. In other words: `event_duration` converted from nanoseconds to milliseconds.

In [14]:
esql_query = """
FROM sample_data
| EVAL duration_ms = event_duration/1000000.0
"""

response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | client_ip    | event_duration | message               | duration_ms
----------------------------------------------------------------------------------------------
2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233        | Connected to 10.1.0.3 | 3.450233   
2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889        | Connected to 10.1.0.2 | 2.764889   
2023-10-23T13:33:34.937Z | 172.21.0.5   | 1232382        | Disconnected          | 1.232382   
2023-10-23T13:51:54.732Z | 172.21.3.15  | 725448         | Connection error      | 0.725448   
2023-10-23T13:52:55.015Z | 172.21.3.15  | 8268153        | Connection error      | 8.268153   
2023-10-23T13:53:55.832Z | 172.21.3.15  | 5033755        | Connection error      | 5.033755   
2023-10-23T13:55:01.543Z | 172.21.3.15  | 1756467        | Connected to 10.1.0.1 | 1.756467   


`EVAL` supports several [functions](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-functions-operators.html#esql-functions). For example, to round a number to the closest number with the specified number of digits, use the [`ROUND`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-functions-operators.html#esql-round "ROUND") function:

In [15]:
esql_query = """
FROM sample_data
| EVAL duration_ms = ROUND(event_duration/1000000.0, 1)
"""

response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | client_ip    | event_duration | message               | duration_ms
----------------------------------------------------------------------------------------------
2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233        | Connected to 10.1.0.3 | 3.5        
2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889        | Connected to 10.1.0.2 | 2.8        
2023-10-23T13:33:34.937Z | 172.21.0.5   | 1232382        | Disconnected          | 1.2        
2023-10-23T13:51:54.732Z | 172.21.3.15  | 725448         | Connection error      | 0.7        
2023-10-23T13:52:55.015Z | 172.21.3.15  | 8268153        | Connection error      | 8.3        
2023-10-23T13:53:55.832Z | 172.21.3.15  | 5033755        | Connection error      | 5.0        
2023-10-23T13:55:01.543Z | 172.21.3.15  | 1756467        | Connected to 10.1.0.1 | 1.8        


## Calculate statistics

You can also use ES|QL to aggregate your data. Use the [`STATS ... BY`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-stats-by "STATS ... BY") command to calculate statistics.

For example, to calculate the median duration:

In [16]:
esql_query = """
FROM sample_data
| STATS median_duration = MEDIAN(event_duration)
"""

response = client.esql.query(query=esql_query)
format_response(response)

median_duration
---------------
2764889.0      


You can calculate multiple stats with one command:

In [17]:
esql_query = """
FROM sample_data
| STATS median_duration = MEDIAN(event_duration), max_duration = MAX(event_duration)
"""

response = client.esql.query(query=esql_query)
format_response(response)

median_duration | max_duration
------------------------------
2764889.0       | 8268153     


Use BY to group calculated stats by one or more columns. For example, to calculate the median duration per client IP:

In [18]:
esql_query = """
FROM sample_data
| STATS median_duration = MEDIAN(event_duration) BY client_ip
"""

response = client.esql.query(query=esql_query)
format_response(response)

median_duration | client_ip   
------------------------------
1232382.0       | 172.21.0.5  
2764889.0       | 172.21.2.113
3450233.0       | 172.21.2.162
3395111.0       | 172.21.3.15 


## Access columns

You can access columns by their name. If a name contains special characters, [it needs to be quoted](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-syntax.html#esql-identifiers "Identifiers") with backticks (`` ` ``).

Assigning an explicit name to a column created by `EVAL` or `STATS` is optional. If you don’t provide a name, the new column name is equal to the function expression. For example:

In [19]:
esql_query = """
FROM sample_data
| EVAL event_duration/1000000.0
"""

response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | client_ip    | event_duration | message               | event_duration/1000000.0
-----------------------------------------------------------------------------------------------------------
2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233        | Connected to 10.1.0.3 | 3.450233                
2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889        | Connected to 10.1.0.2 | 2.764889                
2023-10-23T13:33:34.937Z | 172.21.0.5   | 1232382        | Disconnected          | 1.232382                
2023-10-23T13:51:54.732Z | 172.21.3.15  | 725448         | Connection error      | 0.725448                
2023-10-23T13:52:55.015Z | 172.21.3.15  | 8268153        | Connection error      | 8.268153                
2023-10-23T13:53:55.832Z | 172.21.3.15  | 5033755        | Connection error      | 5.033755                
2023-10-23T13:55:01.543Z | 172.21.3.15  | 1756467        | Connected to 10.1.0.1 | 1.756467                


In this query, `EVAL` adds a new column named `event_duration/1000000.0`. Because its name contains special characters, to access this column, quote it with backticks:

In [20]:
esql_query = """
FROM sample_data
| EVAL event_duration/1000000.0
| STATS MEDIAN(`event_duration/1000000.0`)
"""
response = client.esql.query(query=esql_query)
format_response(response)

MEDIAN(`event_duration/1000000.0`)
----------------------------------
2.764889                          


## Create a histogram

To track statistics over time, ES|QL enables you to create histograms using the [`BUCKET`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-functions-operators.html#esql-bucket "BUCKET") function. `BUCKET` creates human-friendly bucket sizes and returns a value for each row that corresponds to the resulting bucket the row falls into.

> ℹ️ The `BUCKET` function must be used together with the [`STATS ... BY`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-stats-by "STATS ... BY") command. It replaces the `AUTO_BUCKET` function which was removed in 8.14.0.

For example, the following query creates hourly buckets for the data on October 23rd and creates a histogram to count the number of events per hour.

In [49]:
esql_query = """
FROM sample_data
| KEEP @timestamp
| STATS COUNT(*) BY bucket = BUCKET(@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
"""
response = client.esql.query(query=esql_query)
format_response(response)

COUNT(*) | bucket                  
-----------------------------------
2        | 2023-10-23T12:00:00.000Z
5        | 2023-10-23T13:00:00.000Z


Or the median duration per hour:

In [27]:
esql_query = """
FROM sample_data
| KEEP @timestamp, event_duration
| STATS median_duration = MEDIAN(event_duration) BY bucket = BUCKET(@timestamp, 24, "2023-10-23T00:00:00Z", "2023-10-23T23:59:59Z")
"""

response = client.esql.query(query=esql_query)
format_response(response)

median_duration | bucket                  
------------------------------------------
3107561.0       | 2023-10-23T12:00:00.000Z
1756467.0       | 2023-10-23T13:00:00.000Z


## Enrich data

ES|QL enables you to [enrich](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html "Data enrichment") a table with data from indices in Elasticsearch, using the [`ENRICH`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-enrich "ENRICH") command.

> ℹ️ Before you can use `ENRICH`, you first need to [create](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html#esql-create-enrich-policy "Create an enrich policy") and [execute](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html#esql-execute-enrich-policy "Execute the enrich policy") an [enrich policy](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html#esql-enrich-policy).

The following requests create and execute a policy called `clientip_policy`. The policy links an IP address to an environment ("Development", "QA", or "Production").



In [45]:
# Define the mapping
mapping = {
    "mappings": {
        "properties": {"client_ip": {"type": "keyword"}, "env": {"type": "keyword"}}
    }
}

# Create the index with the mapping
client.indices.create(index="clientips", body=mapping)

# Prepare bulk data
bulk_data = [
    {"index": {}},
    {"client_ip": "172.21.0.5", "env": "Development"},
    {"index": {}},
    {"client_ip": "172.21.2.113", "env": "QA"},
    {"index": {}},
    {"client_ip": "172.21.2.162", "env": "QA"},
    {"index": {}},
    {"client_ip": "172.21.3.15", "env": "Production"},
    {"index": {}},
    {"client_ip": "172.21.3.16", "env": "Production"},
]

# Bulk index the data
client.bulk(index="clientips", body=bulk_data)

# Define the enrich policy
policy = {
    "match": {
        "indices": "clientips",
        "match_field": "client_ip",
        "enrich_fields": ["env"],
    }
}

# Put the enrich policy
client.enrich.put_policy(name="clientip_policy", body=policy)

# Execute the enrich policy and wait for completion
client.enrich.execute_policy(name="clientip_policy", wait_for_completion=True)

ObjectApiResponse({'status': {'phase': 'COMPLETE'}})

After creating and executing a policy, you can use it with the `ENRICH` command:

In [46]:
esql_query = """
FROM sample_data
| KEEP @timestamp, client_ip, event_duration
| EVAL client_ip = TO_STRING(client_ip)
| ENRICH clientip_policy ON client_ip WITH env
"""
response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | event_duration | client_ip    | env        
----------------------------------------------------------------------
2023-10-23T12:15:03.360Z | 3450233        | 172.21.2.162 | QA         
2023-10-23T12:27:28.948Z | 2764889        | 172.21.2.113 | QA         
2023-10-23T13:33:34.937Z | 1232382        | 172.21.0.5   | Development
2023-10-23T13:51:54.732Z | 725448         | 172.21.3.15  | Production 
2023-10-23T13:52:55.015Z | 8268153        | 172.21.3.15  | Production 
2023-10-23T13:53:55.832Z | 5033755        | 172.21.3.15  | Production 
2023-10-23T13:55:01.543Z | 1756467        | 172.21.3.15  | Production 


You can use the new `env` column that’s added by the `ENRICH` command in subsequent commands. For example, to calculate the median duration per environment:

In [47]:
esql_query = """
FROM sample_data
| KEEP @timestamp, client_ip, event_duration
| EVAL client_ip = TO_STRING(client_ip)
| ENRICH clientip_policy ON client_ip WITH env
| STATS median_duration = MEDIAN(event_duration) BY env
"""
response = client.esql.query(query=esql_query)
format_response(response)

median_duration | env        
-----------------------------
3107561.0       | QA         
1232382.0       | Development
3395111.0       | Production 


For more about data enrichment with ES|QL, refer to [Data enrichment](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-enrich-data.html "Data enrichment").

## Process data

Your data may contain unstructured strings that you want to [structure](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-process-data-with-dissect-and-grok.html "Data processing with DISSECT and GROK") to make it easier to analyze the data. For example, the sample data contains log messages like:

```
"Connected to 10.1.0.3"
```

By extracting the IP address from these messages, you can determine which IP has accepted the most client connections.

To structure unstructured strings at query time, you can use the ES|QL [`DISSECT`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-dissect "DISSECT") and [`GROK`](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-commands.html#esql-grok "GROK") commands. `DISSECT` works by breaking up a string using a delimiter-based pattern. `GROK` works similarly, but uses regular expressions. This makes `GROK` more powerful, but generally also slower.

In this case, no regular expressions are needed, as the `message` is straightforward: "Connected to ", followed by the server IP. To match this string, you can use the following `DISSECT` command:

In [37]:
esql_query = """
FROM sample_data
| DISSECT message "Connected to %{server_ip}"
"""
response = client.esql.query(query=esql_query)
format_response(response)

@timestamp               | client_ip    | event_duration | message               | server_ip
--------------------------------------------------------------------------------------------
2023-10-23T12:15:03.360Z | 172.21.2.162 | 3450233        | Connected to 10.1.0.3 | 10.1.0.3 
2023-10-23T12:27:28.948Z | 172.21.2.113 | 2764889        | Connected to 10.1.0.2 | 10.1.0.2 
2023-10-23T13:33:34.937Z | 172.21.0.5   | 1232382        | Disconnected          | None     
2023-10-23T13:51:54.732Z | 172.21.3.15  | 725448         | Connection error      | None     
2023-10-23T13:52:55.015Z | 172.21.3.15  | 8268153        | Connection error      | None     
2023-10-23T13:53:55.832Z | 172.21.3.15  | 5033755        | Connection error      | None     
2023-10-23T13:55:01.543Z | 172.21.3.15  | 1756467        | Connected to 10.1.0.1 | 10.1.0.1 


This adds a `server_ip` column to those rows that have a `message` that matches this pattern. For other rows, the value of `server_ip` is `null`.

You can use the new `server_ip` column that’s added by the `DISSECT` command in subsequent commands. For example, to determine how many connections each server has accepted:

In [48]:
esql_query = """
FROM sample_data
| WHERE STARTS_WITH(message, "Connected to")
| DISSECT message "Connected to %{server_ip}"
| STATS COUNT(*) BY server_ip
"""
response = client.esql.query(query=esql_query)
format_response(response)

COUNT(*) | server_ip
--------------------
1        | 10.1.0.3 
1        | 10.1.0.2 
1        | 10.1.0.1 


> ℹ️ To learn more about data processing with ES|QL, refer to [Data processing with DISSECT and GROK](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-process-data-with-dissect-and-grok.html "Data processing with DISSECT and GROK").

## Learn more

To learn more about ES|QL, refer to:
- [_Learning ES|QL_](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-language.html "Learning ES|QL")
- [_Using ES|QL_](https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-using.html "Using ES|QL")