In this post, we’ll review some frequently asked questions concerning the Splunk Common Information Model, or Splunk CIM.
What does CIM (pronounced “sim” and often confused with SIEM) stand for?
Common Information Model
What is it in simplest terms?
The Common Information Model (CIM) is a standard created by Splunk for normalizing the log data of different systems based on the type of information the log events represent. Certain Splunk premium apps, including Enterprise Security (ES), IT Service Intelligence (ITSI), and User Behavior Analytics (UBA) rely heavily on your data being CIM compliant in order for correlation searches to return useful results.
Why would I want to use the Splunk Common Information Model?
To make searching your data easier and more insightful.
Splunk can ingest log data of just about any format. Unfortunately, the data arriving in Splunk from the myriad devices that are used in data centers do not share a common naming convention for the individual fields within a log event. Furthermore, it is frequently unclear which data must be searched to get a complete answer to your query. The Splunk CIM addresses both of these issues.
Consider a user logging into a system: whether the user is logging in to their workstation, a database, or a website the primary components of an “authentication” (or “logon”) event are essentially the same – who is the user (user), where are they coming from (src), what are they trying to log into (dest), and was the login successful (action). Each of these systems (the workstation, the database, and the website) will all report the authentication event with different log messages, and different natively extracted fields: for example, the website’s log might capture the user information in a field called “cs_user” and the destination as server IP “s_ip”.
If you wanted to search all three log sources at the same time for login events by the user “jsmith” you would have to know in detail which log sources contain which fields and write some pretty complex queries. With the Splunk Common Information Model, you can simply search “tag=authentication user=jsmith” and any authentication-related data that has been normalized appropriately to the Common Information Model will arrive in your result set.
What does normalization really mean?
Normalizing your data means ensuring the fields in your data are compliant with the Common Information Model (detailed here). Generally, forcing all of your source systems to update their raw logging to be compliant with this standard would be very difficult to achieve and nearly impossible to manage in the long term so Splunk uses a variety of techniques (knowledge objects) to normalize data at search-time.
Some of the most common techniques include:
• Eventtypes (Splunk SPL searches that identify a group of events) are used to identify a certain type of event (let’s say a Microsoft Windows authentication event – windows_logon_success).
• Tags (like hashtags applied to eventtypes) group eventtypes into categories (windows_logon_success would be given the tag “authentication” because it applies to the Authentication data model).
• Field aliases (renaming extracted fields) and calculated fields (running eval statements on fields at search time) are used to make sure log messages look alike across log sources (the field “username” and “uname” are both mapped to the CIM compliant “user”).
Do I have to do the normalization myself?
You can find add-ons on Splunkbase that create field aliases and other supporting Splunk knowledge objects that make data from a particular vendor’s devices or log source compliant with one or more of the Splunk data models listed above. Look for “CIM Versions” under the “Compatibility” section on the right side of the web page to see if an add-on you’re interested in is already CIM compliant.
You will still need to review the definitions for the data models that interest you to make sure that they are only including data relevant to the category. To fine-tune the data model, you may need to constrain (or expand) which indexes, source types, and Splunk knowledge objects identify the desired data set.
I’ve normalized my data now what can I do with it?
In addition to making your search queries easier and more intuitive, you can leverage normalized data to vastly improve your search performance. Splunk provides a free add-on, the Splunk Common Information Model (CIM), that can be downloaded here.
This add-on includes data models for each of the Common Information Models which can be accelerated to improve search performance on your normalized data sets. Splunk created data models as a means to define which data belongs in a category and what fields you can expect to be present for you to search on.
The Splunk CIM currently has data models defined for 22 categories:
- Alerts
- Application State
- Authentication
- Certificates
- Change Analysis
- Databases
- Data Loss Prevention
- Interprocess Messaging
- Intrusion Detection
- Inventory
- Java Virtual Machines (JVM)
- Malware
- Network Resolution (DNS)
- Network Sessions
- Network Traffic
- Performance
- Splunk Audit Logs
- Ticket Management
- Updates
- Vulnerabilities
- Web
In each data model definition, you will find the fields and Splunk knowledge objects used to delineate which data will be included in the data set for that category. When you accelerate a data model, you are instructing Splunk to summarize your raw data periodically into ONLY the fields required for that data model definition. By searching this summarized version of the data, you can vastly improve your search performance. The cost of this acceleration is increased CPU utilization during the periodic searches against the raw data, and the increased disk usage required to store the summarized results – but generally the advantages outweigh the resource costs, especially for frequently accessed data sets.
This concept of data model acceleration is ESSENTIAL to the function of Splunk premium products such as Enterprise Security and IT Service Intelligence as it allows users to run queries targeted towards Network Traffic or Performance against a summarized data set which can be searched hundreds or thousands of times more quickly than the raw version of the data.
In addition to all this:
• With CIM normalized data, you can take advantage of Splunk’s Pivot tool which allows users who aren’t as familiar with writing SPL queries to analyze their logs through an interface similar to an Excel pivot table.
• You can rest assured that when a new data source is added, as long as it is normalized to the CIM, your CIM compliant queries will still take your new data into account – no more rewriting queries every time a new log source is added!
• You can also more easily share queries with other Splunkers online who have written their queries to fit Common Information Model fields.
Timing and Effort for Splunk CIM Compliance
It’s preferable to normalize data as new data types are ingested rather than trying to do it all at once after the fact. The more data types your company has in Splunk, the more time it will take to make your data CIM compliant.
If your company, like many others out there, has come late to CIM compliance, it may be difficult for even a knowledgeable Splunk admin to carve out the time to get this job done. It can be helpful to engage professional services assistance to accelerate this process.
About SP6
SP6 is a Splunk consulting firm focused on Splunk professional services including Splunk deployment, ongoing Splunk administration, and Splunk development. SP6 has a separate division that also offers Splunk recruitment and the placement of Splunk professionals into direct-hire (FTE) roles for those companies that may require assistance with acquiring their own full-time staff, given the challenge that currently exists in the market today.