During your Security Information and Event Management (SIEM) journey, there will be many terms thrown your way, understanding those terms is absolutely essential when it comes to your security environment.
In this article, we will bring clarity to one of the more important terms in SIEM: CIM (Common Information Model)
Start with the CIM Basics!
- What is CIM?
- CIM is a way to translate, normalize or reshape data from row text format to a field:value format.
- What is the concept behind CIM?
- Event searching and calculation will depend on if the relevant fields for your search actually exist in the data or not.
- Why do we really need CIM in Splunk?
- Every vendor (including Splunk) creates their apps and embedded searches depending on a standard CIM (field’s names and values). This is so that the dashboards, tables, stats will work accordingly. If we miss that CIM part, the dashboards will fail to load because the embedded search will fail.
- One of the most important uses for CIM data is within Splunk Enterprise Security and the Splunk data model. As you may know, data models are knowledge objects that have predefined fields and searches to explore and save data.
- Splunk Enterprise Security (Splunk ES) is a Splunk premium application that relies on the Splunk data model to run its pre-defined dashboards, saved searches, and alter any notable events within our environment. Ensuring that your data is Splunk CIM compliant is the very first thing we need to do during data onboarding.
- Do you have an example?
- Let’s say you have a use case that is searching for a web error (error code=404.0 and every time you try to run this search, you don’t get any results…
- So, what does this mean???
- You don’t have that “error_code” in your data (Field exists, Value does not); or
- You don’t even have the filed “error_code” listed or identified by Splunk (value may exist, Filed does not)
- What can I do to rectify this?
- If you find the field exists and you don’t have that error (case #1), that means you got lucky.
- If you could not find the field listed and you can still see it in the row data (case #2), that means your data has not been extracted in the right way to have this field listed as expected by the use case. Meaning your data isn’t CIM compliant.
Now we have better a understanding of CIM.
We need to ask ourselves, “How can we accurately and methodically translate our data to CIM?”
- The How.
- So now that we understand that extracting our values from the data into the right fields will make our data CIM, next we need to understand how Splunk is extracting those fields so that collaboratively, we can make our data CIM friendly within our Splunk environment.
- Splunk is extracting the fields in one or more of the following three possible phases…
- Forwarding time – > when Splunk reads data and sends it to the indexers
- Indexing time – > when Splunk receives the data and writes it to disk
- Searching time – > when Splunk is reading the data again from the Splunk indexers
- During all the above stages of field extraction, Splunk will rely on two main files to extract the fields and their values for you:
- props.conf
- transforms.conf
- If you want to make your data CIM, you need to work on those files in the right stage, That way, Splunk can understand your data and have the right fields visible for you.
- Please refer to Splunk documentation about the above files, props, and transforms, for further understanding.
- Splunk is using either fields delimiters or regular expression to extract the fields.
- To learn more about fields extraction, please refer to Splunk Document here.
- For Splunk Enterprise Security or if you want to use Splunk pre-defined data model, you have to map your data into that data model’s CIM. This means that you have to see the same fields and values from your data that the data model is expecting.
- Each data model in Splunk has documentation that can be found here and you can refer to this document to understand which fields and values are going to be expected from this data model. This is key to a successful Splunk Enterprise Security deployment!
- Do you have a Data Mapping Example?
- Let’s say I need to run a dashboard in ES that is relying on the authentication data model.
- First, I would refer to this document to see what fields and values this DM is expecting.
- Next, I will need to work on my data using our lovely friends (props & transforms) to get the right fields with the right values.
- When I view my data in Spunk search I see the below log coming in:
- 07/04/2006 12:32:00 host=129.33.66.1 user=test message= Failed login
- This simple line is almost enough to have the authentication data model working as expected, but now I just need to extract and map the fields to match what the document indicated:
- First, start with the first field (action).
- As you can see, the action is a required field that is expecting success or failure values, but I don’t have that field in my data, or at least not plainly. So here is where the fun begins. We need to understand the logs and then get the field extracted correctly, so then after the logs are read, I know that:
- I can extract the user, src, and message directly
- I can evaluate the action field from the message field. For instance: If message = Failed login Then action = failure else action=success
- So I will add some entries to my props.conf:-
- EXTRACT- user = user=(?<user>\w+)
- EXTRACT- src = host=(?<src>\S*)
- EXTRACT- message = message=\s(?<message>\S.+)
- EVAL-action=eval(message==” Failed login”,failure,”success”)
- As you can see, the action is a required field that is expecting success or failure values, but I don’t have that field in my data, or at least not plainly. So here is where the fun begins. We need to understand the logs and then get the field extracted correctly, so then after the logs are read, I know that:
- Now I have extracted and mapped 3 mandatory fields that the authentication data model is expecting. The data model can search the data and populate its results into my dashboard. (*Please note that the above regex is only intended for educational purposes and may not necessarily work in your environment.)
- Let’s say I need to run a dashboard in ES that is relying on the authentication data model.
But wait! There’s more! Don’t forget to tag your data 😊
As we have worked hard to make our data CIM compatible with our Splunk ES environment and data model, it is important that we tag our data. Refer to the data model documentation in order to tag your data correctly. The data model will actually search your tags to find the extracted fields you have provided. Click here for the Splunk documentation to learn more about tags.
CIM Summary
CIM is a standard way to read and understand logs so that when Splunk is reading and searching your logs, Splunk can find the relevant fields inside your indexed data. Having your Splunk data CIM during data onboarding will ensure a smooth ES deployment or for any essential security application that you may be looking to use.
About SP6
SP6 is a Splunk consulting firm focused on Splunk professional services including Splunk deployment, ongoing Splunk administration, and Splunk development. SP6 has a separate division that also offers Splunk recruitment and the placement of Splunk professionals into direct-hire (FTE) roles for those companies that may require assistance with acquiring their own full-time staff, given the challenge that currently exists in the market today.