29 - Dealing With Mapping Exceptions English

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 12

WEBVTT

00:06.960 --> 00:12.180


Mapping is an essential foundation of an index that can generally be considered the
heart of Elasticsearch.

00:12.900 --> 00:16.020


So you can be sure of the importance of a well managed mapping.

00:16.740 --> 00:20.820


But just as it is with many important things, sometimes mappings can go wrong.

00:21.300 --> 00:25.350


We'll take a look at various issues that can arise with mappings and how to deal
with them.

00:28.720 --> 00:34.060


Before delving into the possible challenges with mappings, let's quickly recap some
key points about

00:34.060 --> 00:34.630


mappings.

00:35.350 --> 00:42.220


A mapping essentially entails two parts the process a process of defining how your
JSON documents will

00:42.220 --> 00:48.160


be stored in an index and the result the actual metadata structure resulting from
the definition process.

00:51.960 --> 00:56.850


If we first consider the process aspect of the mapping definition, there are
generally two ways this

00:56.850 --> 00:57.390


can happen.

00:58.170 --> 01:03.120


An explicit mapping process where you define what fields in their types you want to
store along with

01:03.120 --> 01:04.350


any additional parameters.

01:05.340 --> 01:10.950


A dynamic mapping Elasticsearch automatically attempts to determine the appropriate
data type and updates

01:10.950 --> 01:12.030


the mapping accordingly.

01:17.340 --> 01:22.530


The result of the mapping process defines what we can index via individual fields
and their data types,

01:22.530 --> 01:25.950


and also how the indexing happens via related parameters.
01:26.670 --> 01:28.380
Consider this mapping example here.

01:29.190 --> 01:32.850


It's a very simple mapping example for a basic log collection microservice.

01:33.570 --> 01:37.620


The individual logs consist of the following fields and their associated data
types.

01:38.130 --> 01:44.280


The timestamp of the log is mapped as a date service name, which created the log is
mapped as a keyword

01:44.880 --> 01:50.070


IP of the host on which the log was produced is mapped as an IP data type, port
number is mapped as

01:50.070 --> 01:50.640


an integer.

01:51.150 --> 01:57.060


The actual log message map is text to enable full text searching and more as we
have not disabled the

01:57.060 --> 01:58.890


default dynamic mapping process.

01:58.980 --> 02:03.540


So we'll be able to see how we can introduce new fields arbitrarily and they will
be added to the mapping

02:03.540 --> 02:04.260


automatically.

02:08.380 --> 02:09.790


So what could go wrong?

02:10.390 --> 02:14.470


There are generally two potential issues that many will end up with facing with
mappings.

02:15.460 --> 02:20.230


If we create an explicit mapping when fields don't match, we'll get an exception if
the mismatch falls

02:20.230 --> 02:21.850


beyond a certain safety zone.

02:22.390 --> 02:25.030


We'll explain this in more detail later in the exercise.

02:25.930 --> 02:31.180


If we keep the defaults dynamic mapping and then introduce many more fields, we're
in for a mapping

02:31.180 --> 02:33.340


explosion which can take our cluster down.
02:37.980 --> 02:42.180
Let's continue with some interesting hands on examples where we'll simulate the
issues and attempt to

02:42.180 --> 02:42.810


resolve them.

02:45.830 --> 02:49.610


Let's get back to the safety zone we mentioned before when there's a mapping
mismatch.

02:50.270 --> 02:52.160


We'll create our index and see it in action.

02:52.370 --> 02:55.160


We are using the same exact mapping that we saw earlier.

02:55.160 --> 03:00.110


And to save you some typing, I've uploaded some of the larger commands in this
exercise to the web

03:00.110 --> 03:00.530


for you.

03:00.560 --> 03:07.250


So just head over to media that some dog tasks Ofcom slash s slash exceptions dot
text and you'll see

03:07.250 --> 03:09.260


this cheat sheet that you can just copy and paste from.

03:10.100 --> 03:11.600


So we'll start by creating our index.

03:11.600 --> 03:15.050


We're on a call it microservice dash logs containing the following properties.

03:15.350 --> 03:19.010


And although we're defining the port as an integer type, that will be important
later on.

03:19.760 --> 03:20.960


Going to go ahead and copy that.

03:22.360 --> 03:24.460


And back to our terminal and right click to paste.

03:25.760 --> 03:26.090


All right.

03:27.770 --> 03:33.350


Now a well-defined JSON log for this mapping would look something like this and
block to note that the

03:33.350 --> 03:36.890


port is defined as a integer one, two, three, four or five, just like it should be.

03:37.880 --> 03:41.930


But what if another service tries to log its port as a string and not a numeric
value?

03:42.440 --> 03:44.780


Notice that the port is in quotation marks here.

03:44.790 --> 03:46.940


I mean, that's actually a string containing the string.

03:46.940 --> 03:47.960


15,000.

03:48.560 --> 03:49.580


Well, let's try it out.

03:50.810 --> 03:51.410


Copy that.

03:53.630 --> 03:54.650


And pasted it in.

03:57.320 --> 03:57.640


Great.

03:57.650 --> 03:59.630


It actually worked without throwing an exception.

04:00.050 --> 04:02.300


This is that safety zone that I mentioned earlier.

04:03.400 --> 04:07.600


But what if that service law does string that has no relation to numeric values at
all into the poor

04:07.610 --> 04:09.580


field, which we earlier defined as an integer?

04:09.610 --> 04:10.960


Well, let's see what happens then.

04:11.200 --> 04:15.980


So on this one, our message is I am not well because the port is actually the
string none.

04:16.000 --> 04:17.140


That's not a number at all.

04:17.360 --> 04:18.400


Well, let's see what happens.

04:19.520 --> 04:20.120


Copy that.

04:26.810 --> 04:29.930


Number format section under a map or parsing exception.

04:30.000 --> 04:30.410


Hmm.

04:31.010 --> 04:33.920


So we're now entering the world of Elasticsearch mapping exceptions.

04:34.070 --> 04:39.050


We've received a code 400 and the map or parsing exception that is informing us
about our data type

04:39.050 --> 04:44.180


issue, specifically that it failed to pass the provided value of none to the type
integer.

04:45.250 --> 04:46.840


So how do we solve this kind of an issue?

04:47.410 --> 04:50.590


Well, unfortunately, there isn't a one size fits all solution.

04:51.280 --> 04:56.350


In this specific case, we can partially resolve the issue by defining a ignore or
malformed mapping

04:56.350 --> 04:56.920


parameter.

04:57.580 --> 05:02.050


Now keep in mind this parameter is non dynamic, so you either need to set it when
creating your index

05:02.320 --> 05:03.770


or you need to close the index.

05:03.790 --> 05:07.960


Change the setting value and then reopen the index, which is what we're going to do
right now, something

05:07.960 --> 05:08.440


like this.

05:09.220 --> 05:11.920


So let's run the commands in block five here, one at a time.

05:12.610 --> 05:13.960


First, we'll close our index.

05:18.480 --> 05:21.810


And then we'll set index mapping ignore malformed to true.

05:27.610 --> 05:29.130


And will reopen that index.

05:38.120 --> 05:42.450


All right, so now let's try to index that same document again.

05:42.470 --> 05:44.090


That's what's in BLOCK Six here.

05:52.550 --> 05:53.030


All right.

05:53.040 --> 05:53.900


That one actually worked.

05:55.160 --> 05:59.330


Now, if we check the document by its ID, it will show us that the poor field was
actually omitted

05:59.330 --> 06:01.500


for indexing, and we'll see it in the ignored section.

06:01.520 --> 06:02.840


Let's see how that works.

06:03.290 --> 06:11.930


First, we need to copy that ID that we got back after inserting it and we'll type
in kernel ATP.

06:13.130 --> 06:18.320


Local Host 9200 slash microservice dash logs.

06:19.740 --> 06:25.980


Slash underscore doc slash right click to paste in that ID question mark pretty.

06:27.780 --> 06:28.410


Single quote.

06:30.710 --> 06:34.160


So note here, it's telling you that the port field was ignored due to that rule.

06:35.000 --> 06:38.900


Now, the reason this is only a partial solution is because the setting has its
limits and they are

06:38.900 --> 06:39.950


quite considerable.

06:40.460 --> 06:42.110


Let's reveal one of the next example.

06:42.800 --> 06:47.510


A developer might decide that when a microservice receives some API request, it
should log the received

06:47.510 --> 06:49.610


Jason payload in the message field.

06:50.210 --> 06:55.240


Now we already map the message field as text and we still have the ignore malformed
parameter set.

06:55.250 --> 06:56.270


So what would happen?

06:56.450 --> 06:57.200


Well, let's see.

06:57.770 --> 06:59.390


We'll copy block seven here.

07:01.080 --> 07:03.660


That is putting some JSON data within the message.

07:07.490 --> 07:08.650


Let's get a clean slate here.

07:12.120 --> 07:13.080


And we got an error.

07:13.950 --> 07:16.470


So we see our old friend the mapper parsing exception.

07:16.980 --> 07:22.710


This is because ignore malformed can't handle JSON objects on the input, which is a
significant limitation

07:22.710 --> 07:23.550


to be aware of.

07:24.630 --> 07:29.670


Now when speaking of JSON objects, be aware that all the mapping ideas remain valid
for the nested

07:29.670 --> 07:30.570


parts as well.

07:31.320 --> 07:36.240


Continuing our scenario after losing some laws to mapping exceptions, we decided
time to introduce

07:36.240 --> 07:40.380


a new payload field of the type object where we can store the JSON at will.

07:41.340 --> 07:45.960


Now remember, we have dynamic mapping in place so we can index it without first
creating its mapping.

07:46.920 --> 07:48.240


Let's go ahead and try that.

07:49.020 --> 07:51.630


See, we have a payload field now that contains that JSON data.

08:04.430 --> 08:04.940


All good.

08:05.390 --> 08:05.680


All right.

08:05.690 --> 08:08.510


Now we can check the mapping and focus on that payload field.

08:10.060 --> 08:11.080


To say, Colonel.

08:11.080 --> 08:12.100


That's his request.

08:12.100 --> 08:14.080


Gets a copy.
08:16.790 --> 08:25.010
Local host 1800 microservice dash logs slash underscore mapping pretty.

08:29.530 --> 08:30.960


And let's find that payload field.

08:30.970 --> 08:31.480


There it is.

08:33.410 --> 08:37.460


So it was mapped as an object with sub properties defining the nested fields.

08:37.640 --> 08:41.270


So apparently the dynamic mapping works, but there is a trap.

08:41.720 --> 08:47.120


The payloads are and generally any JSON object in the world of many producers and
consumers can consist

08:47.120 --> 08:48.080


of almost anything.

08:48.950 --> 08:53.750


So you know what will happen with different JSON payloads, which also consist of a
payload dot data

08:53.750 --> 08:56.570


that received field but with a different type of data.

08:58.130 --> 09:00.040


Let's try that with BLOCK nine here.

09:05.860 --> 09:08.170


Well, you see, we're just sending a slightly different payload here.

09:15.460 --> 09:17.830


And again, we got the map or a passing exception.

09:19.060 --> 09:20.170


So what else can we do?

09:20.620 --> 09:24.250


Well, engineers on the team need to be aware of these mapping mechanics.

09:24.670 --> 09:27.430


You can also establish shared guidelines for the log fields.

09:28.030 --> 09:33.070


Secondly, you may consider what's called a dead letter queue pattern that would
store the fail documents

09:33.070 --> 09:34.030


in a separate queue.

09:34.570 --> 09:38.890


The C there needs to be handled on an application level or by employing log stache
DL.
09:38.890 --> 09:42.100
Q Which allows us to still process the failed documents.

09:43.650 --> 09:46.020


Let's clear and start with a first slate here.

09:47.160 --> 09:50.460


So now the second area of caution in relation to mappings are limits.

09:51.570 --> 09:53.220


Even from super simple examples.

09:53.220 --> 09:57.480


With payloads, you can see that the number of nested fields can start accumulating
pretty quickly.

09:57.870 --> 09:58.980


Where does this road end?

09:59.040 --> 10:03.930


Well, at the number 1000, which is the default limit of the number of fields in a
mapping.

10:04.800 --> 10:09.090


Let's simulate this exception in our safe playground environment before you'll
unwillingly meet it in

10:09.090 --> 10:10.230


your production environment.

10:11.040 --> 10:17.340


Let's start by creating a large dummy JSON document with 1001 fields, post it and
see what happens.

10:18.300 --> 10:24.140


So to create the document, we're going to use the example command below with the JQ
tool.

10:24.150 --> 10:30.180


And if you don't already have JQ installed, you'll have to do that with sudo apt
dash get install JQ.

10:34.620 --> 10:38.010


And once you have that, you can create the JSON manually or if you prefer.

10:38.070 --> 10:39.240


This is a little bit easier, actually.

10:39.240 --> 10:39.860


A lot easier.

10:39.880 --> 10:44.940


Just go to BLOCK ten here and we'll set up a variable called thousand and one
Fields Jason that contains

10:44.940 --> 10:48.000


the following stuff using JQ.

10:52.490 --> 10:52.910


Copy.

10:54.790 --> 10:55.270


Based.

10:57.530 --> 11:03.650


And you can see all this is doing is echoing that 1001 times to that environment
variable and we can

11:03.650 --> 11:05.060


echo that to take a look at what's in it.

11:09.140 --> 11:09.650


Oh, yeah.

11:09.800 --> 11:10.880


1001 things.

11:14.050 --> 11:20.980


So we can now create a new plane index with a curl that says a location request to
put.

11:22.350 --> 11:27.680


A sleepy local host 1900 slash will call this one big dash objects.

11:31.470 --> 11:33.450


And we'll post in our generated JSON.

11:42.580 --> 11:44.260


Big dash objects.

11:45.460 --> 11:46.000


Underscore.

11:46.000 --> 11:46.450


Duck.

11:47.380 --> 11:47.790


Poshmark.

11:47.830 --> 11:50.590


Pretty backslash.

11:51.040 --> 11:51.340


Dash.

11:51.340 --> 11:51.790


Dash.

11:51.970 --> 11:52.480


Data.

11:52.480 --> 11:53.020


Dash.

11:53.020 --> 11:53.680


Raw.

11:55.440 --> 11:56.010


Quote.

11:57.220 --> 11:59.170


Dollars 9001.

12:00.310 --> 12:00.910


Fields.

12:01.240 --> 12:08.890


Jason and I will import that the contents of that into our big objects index.

12:10.840 --> 12:12.760


And you can guess what happened.

12:13.150 --> 12:15.190


We went straight to the illegal argument.

12:15.190 --> 12:15.640


Exception.

12:15.640 --> 12:19.450


Exception that informs us about the limit being exceeded very explicitly.

12:20.380 --> 12:21.370


So how do you handle that?

12:22.470 --> 12:26.310


Well, first, you should definitely think about what you're storing in your indices
and for what purpose.

12:26.520 --> 12:29.690


Secondly, if you still need to, you can increase this 1000 limit.

12:30.240 --> 12:31.110


But be careful.

12:31.110 --> 12:35.280


As with bigger complexity, you might come a much bigger price of potential
performance degradations

12:35.550 --> 12:36.900


and high memory pressure.

12:37.950 --> 12:40.860


Changing this limit can be performed with a simple, dynamic setting change.

12:40.860 --> 12:44.850


We can just say curl location.

12:47.300 --> 12:55.610


Request put HP local host 1900 slash big dash objects slash underscore settings.

12:57.920 --> 12:58.670


Data raw.

13:02.500 --> 13:08.380


It would be index mapping, not total underscore field start limits.

13:08.890 --> 13:10.450


And we could set that to 1001.

13:12.500 --> 13:14.330


And that would get around that particular issue.

13:15.350 --> 13:15.830


All right.

13:15.830 --> 13:20.180


So now you know that you're more aware of the dangers lurking within mappings and
you're much better

13:20.180 --> 13:22.100


prepared for the production battlefield.

You might also like