Many Splunk disk migration customers start with a single disk for storing Indexes and then want to make changes.
We generally find our customers want to change to (1) improve search performance, especially for hot and warm data, (2) increase daily data ingestion, or (3) retain Indexes for longer durations. In all these cases, the goal would be to keep hot/warm buckets on faster disks and keep warm buckets on cheaper storage.
Below are the steps to follow for the disk migration where we are moving hot/warm buckets to new SSD and keeping warm buckets in their existing place ($SPLUNK_DB)
Disk Migration Assumptions
- The Splunk process is running as splunk user;
- The new SSD mountpoint is /data/splunk_hot_warm;
- The new SSD is also owned by splunk user (the user that owns the splunk process) at boot-time and is writable;
- The new SSD has enough space to store existing hot/warm buckets;
- SPLUNK_HOME=/opt/splunk and $SPLUNK_DB=/opt/splunk/var/lib/splunk; and
- There are 4 indexers in the current environment. Indexer1, Indexer2, Indexer3 and Indexer4.
To Prepare For the Change:
Take backup of all existing indexes.conf deployed on indexers.
Disk Migration Steps
Step 1: Analyze the current indexes.conf that is deployed on indexers. It should have volume definitions. Even If the current index settings are storing everything on $SPLUNK_DB, we should set the hot volume and cold volume pointing to the absolute path of $SPLUNK_DB. This way we first verify that the volume-based definition is working properly and then switch the hot volume path to the new disk path.
Before the volume-based definition:
- [f5]
- disabled = false
- homePath = $SPLUNK_DB/f5/db
- coldPath = $SPLUNK_DB/f5/colddb
- thawedPath = $SPLUNK_DB/f5/thaweddb
After the volume-based definition:
- [volume:hot]
- path = /opt/splunk/var/lib/splunk
- #path = /data/splunk_hot_warm/var/lib/splunk # future use
- #maxVolumeDataSizeMB = 4311810
- # 4.1 TB is max size of new SSD here, need to be set according to your need
- [volume:cold]
- path = /opt/splunk/var/lib/splunk
- [f5]
- disabled = false
- homePath = volume:hot/f5/db
- coldPath = volume:cold/f5/colddb
- thawedPath = $SPLUNK_DB/f5/thaweddb
NOTE: thawedPath doesn’t accept the volume-based path. We have to put either $SPLUNK_DB or absolute path in the definition. We are keeping tstatsHomePath out of scope here, but it needs to be added into your planning for migration.
Step 2: Verify from internal logs that no errors are reported after the volume-based definition is deployed for indexes.
Step 3: Run the below CLI in $SPLUNK_DB directory on each indexer, it will give you the total size of current hot/warm buckets.
- CLI – find ./ -name “db” -type d | xargs du -hsc
You should validate the above results using a dbinspect command on Monitoring Console for indexers.
- | dbinspect index=* splunk_server=<indexer_names> | search (state=hot OR state=warm) | stats sum(sizeOnDiskMB) as hot_volume_size_mb by splunk_server | eval hot_volume_size_gb = round((hot_volume_size_mb/1000),3) | addcoltotals label=Total hot_volume_size_gb
Step 4: Validate and make sure that the hot and warm bucket disk usage reported in the previous CLI is not greater than the new SSD size. In fact, make sure that it’s not more than 90% of the new SSD size to keep some cushion.
Step 5: At this point, you should remove the indexer management app from CM (all the indexes.conf from master-apps) and move the indexes.conf file from slave-apps to $SPLUNK_HOME/etc/system/local on all the indexers. Push the new bundle from CM to the indexers. This is a temporary change until the SSD migration is finished.
Step 6: The actual migration on each indexer would now start. Put CM in maintenance mode and restart Splunk on the first indexer only. Restarting Splunk would roll all the current hot buckets to warm buckets and reduce the chances of bucket id collision after migration.
- This step can be overkill if you have a small environment (less than 500 GB total disk usage and < 100 GB daily usage with bucket size set to default auto/750MB) or if the indexer is already restarted before this step.
NOTE: There are CLI and curl commands available but they just tell you if the bucket roll-off started or not; they do not let you track the status (% complete) of the bucket rolling. While it completes immediately in most cases, we don’t want to assume anything.
JUST an FYI: Though it is not required at all, you can still run below SPL and verify that the hot bucket total size on the indexer has reduced after Splunk restart.
- | dbinspect index=* splunk_server=<indexer_name> | search (state=hot) | stats sum(sizeOnDiskMB) as hot_volume_size_mb by splunk_server | eval hot_volume_size_gb = round((hot_volume_size_mb/1000),3) | addcoltotals label=Total hot_volume_size_gb
Step 7: Stop the Splunk on indexer1 – opt/splunk/bin/splunk stop
Step 8: Modify the indexes.conf on indexer1, change the definition of volume: hot, and point it to the new SSD location. Volume cold would remain as it is.
- [volume:hot]
- path = /opt/splunk_hot_warm/var/lib/splunk
- maxVolumeDataSizeMB = 4311810
- # 4.1 TB is max size of new SSD here, need to be set according to your need.
NOTE: We are making this change only on Indexer1 and not on all the indexers.
Step 9: Copy the existing hot and warm buckets (directory name that starts with db) to the new SSD using Linux CLI.
- cd /opt/splunk_db/var/lib/splunk/
- nohup find ./ -name “db” -type d | xargs cp -r –parents -t /opt/splunk_hot_warm_db/var/lib/splunk/ >> ./copy_status.txt 2>&1 &
Note: We are running the above process in the background so that even if we lose connection or close the terminal, the copy process would still continue and we can check the copy output in copy_status.txt file.
Step 10: You can check the copy process completion by tracking the process using ps -ef | grep nohup as well as looking at the copy_status.txt file. Once the copy process ends you can switch to /data/splunk_hot_warm/var/lib/splunk directory and check the total hot and warm bucket size using the same CLI that we used earlier in Step 3. The total disk usage should be around the same we found in step 3 (before migration). However, it’s impossible that disk usage would be the exact same due to many other factors like the bucket rolling from warm to cold in this process, OS file system characteristics, etc.
- Cd /data/splunk_hot_warm/var/lib/splunk
- find ./ -name “db” -type d | xargs du -hsc
Step 11: Just to make sure the ownership of the new SSD mountpoint executes this CLI as root after copying buckets, chown -R splunk:splunk <new_mountpoint>.
Step 12: Start Splunk on the indexer.
Step 13: Disable Maintenance mode on CM.
Step 14: Check for any critical errors in the Splunkd logs on CM, any indexers, OR any critical messages in UI. Run a few sample queries on _internal index as well some high traffic events (winevents for example) and validate the results are coming from indexer1 on which we migrated the hot and warm buckets.
Step 15: Run the dbinspect command and validate the hot bucket path for actively written indexes. The hot bucket path should give the new SSD directory paths.
- | dbinspect index=* splunk_server=<indexer1> | search (state=hot)
Step 16 Optional: Splunk Search query to check hot + warm bucket size on each indexer after migration.
- | dbinspect index=* splunk_server=<indexer_name> | search (state=hot OR state=warm) | stats sum(sizeOnDiskMB) as hot_volume_size_mb by index | eval hot_volume_size_gb = round((hot_volume_size_mb/1000),3) | addcoltotals label=Total hot_volume_size_gb
Note: The number in above search output would be similar to that of step 3.
Step 17: Now we need to delete the hot/warm bucket directories from $SPLUNK_DB (the old path).Run following commands
- Cd /opt/splunk/var/lib/splunk directory.
- find ./ -name “db” -type d
Analyze the output of the above CLI. It should list db directories of all the indexes. These directories are already copied to the new SSD and are into effect now. We can proceed and delete them using the below command.
- find ./ -name “db” -type d -exec rm -r “{}” \;
Step 18: If everything looks good, do the same steps (steps from 6 to 17) for other indexers one by one.
Step 19: Once the migration is complete on all indexers, take the indexes.conf copy from any one indexer and push it from CM to all indexers.
Step 20: Delete the local copy ($SPLUNK_HOME/etc/system/local) of indexes.conf on indexers.
Disk Migration: Summary
Because many Splunk customers start with a single disk for storing Indexes and then want to make changes, we hope you found this disk migration guide helpful.
About SP6
SP6 is a Splunk consulting firm focused on Splunk professional services including Splunk deployment, ongoing Splunk administration, and Splunk development. SP6 has a separate division that also offers Splunk recruitment and the placement of Splunk professionals into direct-hire (FTE) roles for those companies that may require assistance with acquiring their own full-time staff, given the challenge that currently exists in the market today.