batchworks.de

How tsidxWritingLevel affects storage size and performance

2022-03-15T00:00:00+00:00

Today we want to have a look at an index parameter and how it is affecting storage size and performance. In indexes.conf.spec you find the parameter tsidxWritingLevel. This parameters will configure how splunk creates index files over your rawdata within a bucket. The parameter was introduced in v7.2 and updated in v7.3 and v8.1. This also sets the minimum splunk version for a bucket index, meaning you will not be able to read buckets created on v8.1 with level 4 on a v8.0 system.

Overview of splunk versions and default tsidxWritingLevel:

Splunk Version	available tsidxWritingLevel	default tsidxWritingLevel
v8.2.0	1,2,3,4	2
v8.1.0	1,2,3,4	1
v8.0.0	1,2,3	1
v7.3.0	1,2,3	1
v7.2.0	1,2	1
v7.1.0	no setting, 1 assumed	1

So if you want to benefit from the latest storage and performance improvements from Splunk Enterprise you have to increase this setting. As I haven’t found any valid numbers besides “up to 40% reduced storage”* what an increase of the parameter will mean in real world, i decided to test it out on my own.

When changing this setting, only new buckets will be created with the higher level. Old data which was produced with an older setting will not be converted.

Test scenario

I created a test where I startup a single Instance on AWS, feed it with some logs, capture the time taken and the size of the tsidx files and repeat for every tsidxWritingLevel 3 times to validate the results.

test steps:

run splunk on AWS instance: m5.xlarge (4vCPU, 16GB RAM), 30GB Storage, default SSD
set tsidxWritingLevel
ingest 950k Events (863MB of raw data) of RouterOS logfiles taken from a prod system
measure time needed for ingest (5 second interval) and the size of buckets created
repeat this test 3 times

Test results

Splunk Enterprise v8.1.5

run number	tsidxWritingLevel	time taken ingest	Bucket sizeOnDisk (MB)	load avg	1min load avg
1	1	95s	519.89	3.95, 1.68, 1.56	4.05
2	1	100s	519.25	4.23, 2.57, 1.9	4.05
3	1	95s	525.32	3.97, 2.98, 2.14	4.05
1	2	100s	475.30	4.19, 3.29, 2.36	4.47
2	2	100s	475.89	4.61, 3.68, 2.63	4.47
3	2	95s	472.64	4.62, 3.86, 2.83	4.47
1	3	105s	461.16	4.17, 3.83, 2.96	4.29
2	3	100s	452.96	3.9, 3.71, 3.03	4.29
3	3	95s	450.67	4.79, 3.97, 3.21	4.29
1	4	105s	403.32	4.35, 3.92, 3.29	4.07
2	4	100s	413.07	3.85, 3.82, 3.34	4.07
3	4	100s	407.33	4.01, 3.83, 3.4	4.07

average tsidx storage needed:

tsidxWritingLevel 1: (519.89 + 519.25 + 525.32) / 3 = 521.49 MB = 100%
tsidxWritingLevel 2: (475.30 + 475.89 + 472.64) / 3 = 474.61 MB = 91.4% (-8.6% reduction)
tsidxWritingLevel 3: (461.16 + 452.96 + 450.67) / 3 = 454.93 MB = 87.2% (-12.8% reduction)
tsidxWritingLevel 4: (403.32 + 413.07 + 407.33) / 3 = 407.91 MB = 78.2% (-22.8% reduction) - winner!

More or less as expected, the hightest tsidxWritingLevel=4 had most optimizations regarding storage. For this test case ~20% less disk space is needed for the index files. Notice that these really only apply for the given dataset, your results may vary.

As we see the ingest time seems slightly but not noticeable higher. The load avg displays the 1min, 5min, 15min for documentation purposes. It is expectable that 5 and 15 min load is rising during the test.. in general the avg 1 min load is quite comparable.

What about metric indexes?

For metrics we created a sample csv of 15mio. events containing 15 metrics with different values and 3 static dimensions. The resulting rawdata is 810MB.

Splunk Enterprise v8.1.5

run number	tsidxWritingLevel	time taken ingest	Bucket sizeOnDisk (MB)	load avg
1	1	95s	205.51	2.57, 1.34, 0.66
2	1	95s	204.38	2.96, 1.89, 0.95
1	2	95s	203.97	3.61, 2.4, 1.26
2	2	95s	205.30	3.13, 2.55, 1.47
1	3	95s	204.72	2.82, 2.57, 1.62
2	3	95s	204.36	2.93, 2.69, 1.8
1	4	95s	204.18	2.75, 2.61, 1.89
2	4	95s	204.53	2.82, 2.58, 1.97

As we see there is almost no difference in the resulting bucketsize metrics. All results vary less than 1%. All runs have the same ingest time.

We learned that tsidxWritingLevel of metric indexes have no impact on storage size so far.

how to i test my data?

Here you find a link to the git repo where the tests are documented. You can adjust the config.yml file to create your own tests with your own data.

Notes

There is a bug affecting version v8.1.1, v8.1.2, v8.1.3 (fixed in v8.1.4) and v8.2.0 (fixed in 8.2.1) documented as SPL-197930. There indexers have huge memory spikes and may crash when tsidxwritinglevel = 4 is set: see link.

Howto recover "| delete"ed events

2021-10-11T00:00:00+00:00

After a while we want to restart the blog series. Today we want to show how to recover data in splunk which had been deleted using the "| delete" command.

Even if “| delete” is not a very common command, it’s used from time to time to clean up unwanted events. So what happens if you delete data by mistake? How to recover those events when the docs say it’s not possible?

When we look at the documentation it’s stated that “Removing data is irreversible. If you want to get your data back after the data is deleted, you must re-index the applicable data sources." and “Using the delete command marks all of the events returned by the search as deleted." Re-index of data is quite often impossible when data is coming from transient data sources like MQTT or REST interfaces.

Well, what if we find these markers and remove them? Normally this should bring us in the situation where data is searchable again, right?!

All event data in Splunk is stored in indexes. Every index consists of buckets, which are folders with a predefined naming convention. Let’s have a look at those buckets and compare a bucket with deleted and non-deleted data.

examine the bucket structure before deletion

Let’s search for some data. Please note that the internal field _bkt is the bucket where an event is stored.

splunk search "index=test sourcetype=testjson earliest=0 | stats count values(_bkt) as bucket values(index) as index | table index count bucket"
index count                   bucket
----- ----- -------------------------------------------
test    200 test~5~8318D59B-46EF-45B4-ACFA-AB89AAF73434```

Here we find 200 events of sourcetype testjson in the index “test” in the bucket test~5~8318D59B-46EF-45B4-ACFA-AB89AAF73434. Let’s jump into the filesystem structure and find this bucket/folder.

db_1623401960_1623401955_5
├── 1623401960-1623401955-14536582616925630097.tsidx
├── Hosts.data
├── SourceTypes.data
├── Sources.data
├── bloomfilter
├── bucket_info.csv
├── optimize.result
└── rawdata
    ├── 0
    └── slicesv2.dat

total 56
-rw-------  1 andreas  staff  4640 11 Jun 11:03 1623401960-1623401955-14536582616925630097.tsidx
-rw-------  1 andreas  staff   103 11 Jun 11:03 Hosts.data
-rw-------  1 andreas  staff   103 11 Jun 11:03 SourceTypes.data
-rw-------  1 andreas  staff    94 11 Jun 11:03 Sources.data
-rw-------  1 andreas  staff    49 11 Jun 11:03 bloomfilter
-rw-------  1 andreas  staff    67 11 Jun 10:59 bucket_info.csv
-rw-------  1 andreas  staff     0 11 Jun 11:03 optimize.result
drwx------  4 andreas  staff   128 11 Jun 11:03 rawdata

This is how a bucket looks like: The rawdata subdirectory contains the original events in a compressed format. The *.tsidx files are the index over those rawdata events. *.data files are holding meta information about the rawdata source, sourcetype and hosts fields.

checking bucket structure after deletion

We run all commands from the cli, as this might be easier to read in the article. Now let’s delete some data using the “| delete” command.

splunk search "index=test sourcetype=testjson earliest=0 | delete"
INFO: 200 events successfully deleted
      splunk_server         index  deleted errors
-------------------------- ------- ------- ------
andreass-MacBook-Pro.local __ALL__     200      0

and search for the data again:

splunk search "index=test sourcetype=testjson earliest=0 | stats count"
count
-----
    0

It seems as if the data is now deleted, or rather “marked as delete” successfully.

Now let’s run tree and ls again:

db_1623401960_1623401955_5
├── 1623401960-1623401955-14536582616925630097.tsidx
├── Hosts.data
├── SourceTypes.data
├── Sources.data
├── bloomfilter
├── bucket_info.csv
├── optimize.result
└── rawdata
    ├── 0
    ├── deletes
    │   └── 8c1659e22188a580759cbf34a6e26308.csv.gz
    └── slicesv2.dat

total 56
-rw-------  1 andreas  staff  4640 27 Sep 18:07 1623401960-1623401955-14536582616925630097.tsidx
-rw-------  1 andreas  staff   103 11 Jun 11:03 Hosts.data
-rw-------  1 andreas  staff   103 11 Jun 11:03 SourceTypes.data
-rw-------  1 andreas  staff    94 11 Jun 11:03 Sources.data
-rw-------  1 andreas  staff    49 11 Jun 11:03 bloomfilter
-rw-------  1 andreas  staff    67 11 Jun 10:59 bucket_info.csv
-rw-------  1 andreas  staff     0 11 Jun 11:03 optimize.result
drwx------  5 andreas  staff   160 27 Sep 18:07 rawdata

recovering the data

Well, let’s have a closer look at the buckets 1623401960-1623401955-14536582616925630097.tsidx file. From the timestamp we see that this file has been modified while we deleted the data. Let’s try to rebuild this .tsidx file. So we stop Splunk and run rebuild using the “splunk fsck repair” command.

splunk stop
rm -f /Users/andreas/splunk/var/lib/splunk/test/db/db_1623401960_1623401955_5/1623401960-1623401955-14536582616925630097.tsidx

splunk fsck repair --one-bucket --bucket-path=/Users/andreas/splunk/var/lib/splunk/test/db/db_1623401960_1623401955_5

splunk start

and search again:

splunk search "index=test sourcetype=testjson earliest=0 | stats count"
count
-----
    0

No luck.. data still not searchable. Looks like we have overseen something.

Let’s have a closer look at the bucket.. see the “deletes” subdirectory from rawdata? This directory and it’s content was also created when we deleted the data. Now, let’s stop Splunk, remove the “deletes” subdirectory and repair the bucket again.

splunk stop
rm -rf ~/splunk/var/lib/splunk/test/db/db_1623401960_1623401955_5/rawdata/deletes

splunk fsck repair --one-bucket --bucket-path=/Users/andreas/splunk/var/lib/splunk/test/db/db_1623401960_1623401955_5

splunk start

Voila, the data is available again.

splunk search "index=test sourcetype=testjson earliest=0 | stats count"

count
-----
  200

what happend?

When running the “| delete” command, Splunk is actively changing the .tsidx index files to ensure searching of the deleted data is not possible anymore.

Besides that, the subdirectory “deletes” with markers to the deleted events in rawdata is created. Those markers come into play when the bucket is recreated from rawdata. Those recreation will take place if you thaw a frozen bucket from an archive or make replicated buckets searchable using index clustering. For that reason the first recovery attempt failed, as the “deletes” directory just marked the events in the index “deleted” again.

This mechanism makes Splunk Enterprise consuming more storage when you are deleting data.

If you would just restore the .tsidx file from your backup the events are immediately searchable again - even without a splunk restart. BUT if you are not cleaning up the “deletes” directory, events will be marked as deleted at the next index rebuild. So be aware when recovering buckets from backup: always recovery the entire bucket and ensure the “deletes” subdirectory is also deleted!

Happy backup & restore!

a short introduction to Hugo

2021-10-07T00:00:00+00:00

Recently Andreas has migrated our website from Wordpress to Hugo and after that i have written this small introduction for a TLUG workshop. The article describes the basic installation and configuration steps plus the creation of the first content.

What and why is hugo?

static website generator that
- supports separation of content and code with themes
- renders your content into static html pages with css and optional javascript integration
- can be fully automated with the tool (Ansible, Jenkins, Helm etc.) of your choice
- is written in Go and released as static binary
to simplify webserver infrastructures by avoiding
- databases like MySQL
- server side program languages like PHP
- caching components like Redis

Installation

the official documentation describes the installation
- from source
- using a Github release for several operating systems and cpu infrastructures
- for Windows and other operating systems with the related package managers
- with homebrew for MacOS

brew install hugo

Configuration

create the website folder structure
choose a theme and create the folder for it
download and extract the theme into the folder

hugo new site --source ./ test.batchworks.de
mkdir -p test.batchworks.de/themes/hugo-geekdoc/
curl -sL https://github.com/thegeeklab/hugo-geekdoc/releases/download/v0.19.1/hugo-geekdoc.tar.gz | tar -xz -C test.batchworks.de/themes/hugo-geekdoc/

extend the hugo configuration (config.toml) with the desired options
this also includes some theme related options
- Theme options are not namespaced. This makes it sometimes hard to switch between them

baseurl = "http://test.batchworks.de/"
languageCode = "en-us"
title = "test.batchworks.de"
theme = "hugo-geekdoc"

[permalinks]
  posts = "/:title/"
  page = "/:slug/"

## geekdoc theme settings

# Required to get well formatted code blocks
pygmentsUseClasses = true
pygmentsCodeFences = true
disablePathToLower = true
enableGitInfo = false

[markup]
  [markup.goldmark.renderer]
      unsafe = true
  [markup.tableOfContents]
    startLevel = 1
    endLevel = 9

[taxonomies]
  author = "authors"
  tag = "tags"

## geekdoc theme settings end

# Theme variables
#
[params]
  # Site author
  author = "Birk Bohne"

  geekdocBreadcrumb = false

  # Format dates with Go's time formatting
  date_format = "Mon Jan 02, 2006"

Content

the first page

Hugo pages or posts are written in Markdown with some theme specific header config
create the first page in the test.batchworks.de/content/_index.md file and add the content

---
title: Welcome to the test site
geekdocDescription: This is the start page.
weight: 10
---

# The start page

Welcome to the start page

local test with hugo

start hugo in server mode to locally render the content

hugo server --source test.batchworks.de/ --baseUrl http://localhost

Start building sites …
hugo v0.88.1+extended darwin/amd64 BuildDate=unknown

                  | EN
-------------------+------
  Pages            |   7
  Paginator pages  |   0
  Non-page files   |   0
  Static files     | 105
  Processed images |   0
  Aliases          |   2
  Sitemaps         |   1
  Cleaned          |   0

Built in 31 ms

check the result within a browser by opening http://localhost:1313/

update the first page

paste the content of the README.md below the theme config section in the test.batchworks.de/content/_index.md file
check your browser

content structure with folders

create a test.batchworks.de/content/sub/ folder and an _index.md file in it
add the content

---
title: Sub content
geekdocDescription: focus on other topics
weight: 10
---

the sub page

check your browser again

add charts

the Geekdoc theme supports Mermaid for the rendering of different chart types
create a test.batchworks.de/content/sub/mermaid.md file
add the content

---
title: Mermaid charts
geekdocDescription: render charts with mermaid markup code
weight: 10
---

{{< mermaid class="text-center">}}
flowchart TD
    A[Start] --> B{Is it?};
    B -->|Yes| C[OK];
    C --> D[Rethink];
    D --> B;
    B ---->|No| E[End];
{{< /mermaid >}}

check your browser again

use structured data sources

Hugo can read data structures for instance from JSON or YAML
this can be used to populate tables or other html structures
First we have to create the html template
- create a test.batchworks.de/layouts/shortcodes/data_table.html file
- add the content

{{ $arg0 := .Get 0 }}
{{ $data := index .Site.Data.content $arg0 }}
{{ $.Scratch.Set "count" 0 }}

<table>
    <thead>
      <tr>
        <th>Nameth>
        <th>Functionth>
      tr>
    thead>

    <tbody>
    {{ range $datacontent := $data }}
        <tr>
            <td>{{ $datacontent.name }}td>
            <td>{{ $datacontent.function }}td>
        tr>
    {{ end }}
    tbody>
table>

Next we have to create the data source
- create a test.batchworks.de/data/content/data.yaml file
- add the content

- name: "json"
  function: "read JSON"
- name: "csv"
  function: "read CSV"

Now we create the page that should render the table
- create a test.batchworks.de/content/sub/table.md file
- add the content

---
title: Data tables
geekdocDescription: render tables
weight: 20
---

## Data sources

{{< data_table "data" >}}

check your browser again

Render the site

use Hugo to render the static pages
they can than be served by the webserver of your choice without further dependencies
- create a target and a tmp dir
- call hugo to render the pages

mkdir -p tmp/ www/
hugo --source ${PWD}/test.batchworks.de/ --cacheDir ${PWD}/tmp --destination ${PWD}/www --baseURL http://test.batchworks.de

Further steps

maintain the Hugo config and content in a source code management system
maintain the rendered pages in an SCM too
upload the www folder to your web server
use automation tools like Ansible, Jenkins etc. to automate the workflow

collectd2 App for Splunk Enterprise released

2018-05-10T00:00:00+00:00

Today we released an enhanced version of the collectd app for Splunk. As the app is using metric index and enhanced mstats command, you will need to use Splunk Enterprise version 7.1.

Further information could be found on Splunkbase (https://splunkbase.splunk.com/app/4005/) and the github repo located at: https://github.com/schose/collectd2.

published Splunk Technology Add-On for Mikrotik RouterOS

2018-01-10T00:00:00+00:00

As some of you know we love these small Mikrotik boxes running RouterOS. They are offering a rich feature set and functionality at a very reasonable price. We also love Splunk.. so it makes perfect sense to import RouterOS data into Splunk. To have greater value of your data we’ve created a Splunk Technology Add-On for RouterOS.

Development takes place in the git repo hosted at https://git.batchworks.de/andreas/TA-routeros . You can download it from there or from https://splunkbase.splunk.com/app/3845/.

Data is extracted for the Splunk CIM data models network traffic, name resolution (DNS), DHCP and authentication.

Why using XML Event Logs sucks using Splunk

2017-10-06T00:00:00+00:00

Yesterday I had a discussion with a colleague if we should switch the indexing of Windows Eventlogs to XML. He mentioned that he was told that it’s faster, needs less data volume and language agnostic.

As I couldn’t imagine that something with the abbreviation “XML” in it could be something like “small” and “fast” I decided to do a test.

Interesting enough there is a blog article found at http://blogs.splunk.com/2014/11/04/splunk-6-2-feature-overview-xml-event-logs/ also stating that you would have an data reduction.Interesting enough there is a blog article found at http://blogs.splunk.com/2014/11/04/splunk-6-2-feature-overview-xml-event-logs/ also stating that you would have data reduction.

You start to collect XML Events with adding renderXml = 1 to the input stanza. When doing so the suppress_text = 1 is automatically set. Of course you could also omit the Eventlog message for your non-XML input and achieve the same volume reduction. Here I will keep the Eventlog message for the XML and non-XML scenario to ensure not comparing apples and oranges.

Index performance

The same dataset was indexed by the same forwarder on the same hardware.. let’s determine the time needed for indexing:

index=noxml OR index=xml | stats count earliest(_indextime) as ite latest(_indextime) as itl by index | eval timediff = itl-ite | convert ctime(ite) ctime(itl)

Indexing the XML took 195 seconds vs. 153 seconds, 27,5% longer.

Size

Determine the indexsize:

| dbinsprect index=noxml OR index=XML | table index SizeOnDiskMB

XML needed 17,7% more storage.

Search performance

Running a noncomplex search, I can’t remember if I was more surprised that the XML search was more than 10X slower or that it showed a different result count. I repeated the search 3 times to ensure the results are accurate.

index=noxml | stats count by EventCode – (fast mode enabled)

This search has completed and has returned 223 results by scanning 56,807 events in 2.923 seconds. This search has completed and has returned 223 results by scanning 56,807 events in 2.91 seconds. This search has completed and has returned 223 results by scanning 56,807 events in 2.888 seconds.

index=xml | stats count by EventCode – fast mode

This search has completed and has returned 106 results by scanning 56,807 events in 33.822 seconds. This search has completed and has returned 106 results by scanning 56,807 events in 33.593 seconds. This search has completed and has returned 106 results by scanning 56,807 events in 33.309 seconds.

All time ran into command.search.kv

I found a lot of events where the EventID is not extracted correctly from the XML..

index=xml OR index=noxml sourcetype=*application* RecordNumber=9172 | table EventCode sourcetype index _raw

All tests was done on “latest an greatest” Splunk TA Windows v4.8.3 runing on Splunk Enterprise v6.5

Summary

Do never ever use XML rendering because of performance or expected data volume reduction. For now the only valid reason seems to be to overcome language issues..

How use Splunk DB Connect with H2 databases

2017-01-09T00:00:00+00:00

Today I got a question from a colleague if it’s possible to connect a H2 database engine to Splunk. It would be great to index events from that database – as it contains security events coming from an anti-virus system.

To index events based on a RDBMS there is Splunk’s well-known DB Connect app (https://splunkbase.splunk.com/app/2686/). Unfortunately the DB Connect support matrix doesn’t mention anything with H2 database – so I decided to test it out.

H2 Database

I’ve never run into H2 before, it really seems to be a niche product. The installation is downloading and extracting the .zip file – awesome! It’s the size of 1.5 MB and has a great feature set like In-Memory Mode, Built-in Clustering / Replication…

http://www.h2database.com/html/features.html#comparison

By default H2 has 2 connection modes:

Embedded/local mode: only local connections using JDBC
Server mode: remote connections using ODBC or JDBC
mixed

DB Connect setup

As always you need to install Java 8 for DB Connect. Even if openjdk is working fine I always recommend to use Oracle Java for support reasons. Extract DB Connect to your $SPLUNK_HOME$/etc/apps path and run the setup wizard if you have a full Splunk installation.. otherwise you can edit app.conf and inputs.conf to enable it and set the JRE path correctly.

app.conf

[install]
is_configured = 1

inputs.conf
[rpcstart://default]
javahome = /usr/local/jre1.8.0_111
useSSL = 0

Full installation procedure is documented in splunk docs at http://docs.splunk.com/Documentation/DBX/2.4.0/DeployDBX/Checklist .

Next you need to download H2 database (http://www.h2database.com/h2-2016-10-31.zip), extract and copy the bin/h2-1.4.193.jar to $SPLUNK_HOME$/etc/apps/splunk_app_db_connect/bin/lib directory.

Next configure a custom db type by creating the config file $SPLUNK_HOME$/etc/apps/splunk_app_db_connect/local/db_connection_types.conf. This is not implemented in the DB Connect WebGUI.

db_connection_types.conf:

[h2tcp]
displayName = H2-tcp
serviceClass = com.splunk.dbx2.DefaultDBX2JDBC
jdbcUrlFormat = jdbc:h2:tcp://: /
jdbcDriverClass = org.h2.Driver

[h2local]
displayName = H2-local
serviceClass = com.splunk.dbx2.DefaultDBX2JDBC
jdbcUrlFormat = jdbc:h2:/
jdbcDriverClass = org.h2.Driver

The [h2tcp] stanza defines the connection for server mode, while [h2local] defines embedded/local mode. After doing so and restarting Splunk you’ll see two new driver entries in DB Connect – stating “unsupported”

Create credentials first, followed by a connection. Make sure to use TCP/9092 when connecting to a remote H2 instance. The remote instance has to be started using the –tcpAllowOthers parameter.

A new connection will be saved in db_connections.conf. This is an example:

[h2remote]
connection_type = h2tcp
database = /tmp/h2demo
host = 127.0.0.1
identity = sa
jdbcUrlFormat = jdbc:h2:tcp://: /
jdbcUseSSL = 0
port = 9092

When defining an input to pull events out of the db this is done like always in inputs.conf. Here is an example:

inputs.conf

[mi_input://h2remote-users]
connection = h2remote
enable_query_wrapping = 1
index = test_high
interval = 60
max_rows = 10000
mode = tail
output_timestamp_format = yyyy-MM-dd HH:mm:ss
query = SELECT * FROM INFORMATION_SCHEMA.USERS
sourcetype = dbx:h2
tail_rising_column_name = ID
ui_query_mode = advanced
tail_rising_column_checkpoint_value = 2

H2 restrictions

Other than you might expect it’s not possible to use two applications writing or reading in local/embedded mode. You’ll receive the message “org.h2.jdbc.JdbcSQLException: Database may be already in use: null. Possible solutions: close all other connection(s); use the server mode [90020-193]”. This is by design and can be solved as mentioned before by starting H2 with the –tcp for local-only connections or –tcpAllowOthers for all other connections.

manually roll Splunk buckets from hot to warm

2016-05-19T00:00:00+00:00

As you might know indexes are where your data in splunk is stored. An index contains of time-based buckets (directories). Over time a bucket – the indexed data – is rolling from hot (when data is still written to the bucket) to warm (data is read-only) to cold. When you want to backup Splunk you need the data in a consistent state – in a warm bucket.

There are different situations when a bucket is rolled from hot to warm:

restart of the splunk service
parameter maxDataSize in indexes.conf is hit (750MB default, 10GB with auto_high_volume)

here’s an overview of indexes.conf parameters:

As you see there is no time but only a size value when Splunk rolls a bucket from hot to warm. There are situations where you manually want to roll a bucket from hot to warm. Here is a hot bucket in an index named “bwindex” with the ID 37.

You can force Splunk to roll the bucket with using this command: splunk _internal call /data/indexes/INDEXNAME/roll-hot-buckets -auth admin:password Where INDEXNAME the name of the index to roll.

You can also trigger the rolling on a remote indexer using curl: curl –k https://localhost:8089/services/data/indexes/bwindex/roll-hot-buckets -x POST -u admin:password After that we check again on the hot buckets and see that there is a new hot bucket with ID 38. The old bucket has been renamed to db_timestamp_timestamp_ID.

There might be situations in the real world where you want to roll the hot buckets manually. There was an index replication cluster where the search factor and replication factor wasn’t met. The error message was “Cannot fix search count as the bucket hasn’t rolled yet”. As this could take up to 90 days to complete (if the data volume is small enough) we want to force the indexer to roll the bucket.

There is a screenshot:

Splunk v6.4

There is a feature easy to overlook in Splunk6.4 which is named “Force roll specific hot buckets“. You can find the documentation here. There is a new REST endpoint /services/cluster/master/control/control/roll-hot-buckets – which makes your life easier. In older versions you need to determine the concrete indexer by matching the GUID from the bucket information (e.g.: _audit~2~1A3889D7-954B-4CE6-B071-01B438DE9865) and send the REST request to the cluster peer directly.

old method – pre v6.4(for every indexer and bucket):

determine the indexer with rest query | rest /services/cluster/master/peers splunk_server=local | table id label status last_heartbeat
roll the bucket: curl –k https://clusterpeer:8089/services/data/indexes/INDEXNAME/roll-hot-buckets -x POST -u admin:password

Now you can force the cluster master to advice the cluster peer to roll the bucket.

new method (for every bucket): curl -k -u username:password https://localhost:8089/services/cluster/master/control/control/roll-hot-buckets -X POST -d "bucket_id=_audit~2~1A3889D7-954B-4CE6-B071-01B438DE9865"

Hope this helps..

Nagios Plugin to check Splunk license usage

2015-02-20T00:00:00+00:00

In today’s article about Splunk monitoring we want to monitor the Splunk license usage. You want to keep an eye on the license usage, as 5 warnings of the daily indexing volume using the enterprise license or 3 warnings using the free license will cause a license violation.

A license violation will deactivate Splunk searches but not the indexing process. So you will not be able to query your data – but at least never loose it.

Typically a license warning is displayed in the web console of Splunk.

This warning is fine – but you want to get a notification using your normal monitoring and escalation process it’s simply not enough. For that reason I created a Powershell script which queries Splunk for the amount of indexed data and creates warningor critical events in your monitoring solution (e.g. Nagios)

As in the other monitoring articles for checking client versions and connections to Forwarder Management I’m using Splunk Powershell resource kit. Again – you will just need a Windows machine for executing the Powershell script – your Indexers could be running on non-Windows machines.

Setup monitoring using nsclient++ on Windows

Find the Download for the Script here.

Download and extract the files to C:\Program Files\NSClient++\scripts\splunk
Adjust your “C:\Program Files\NSClient++\nsclient.ini” and add the external script

[/settings/external scripts/scripts]
check_splunklicense = cmd /c echo scripts\\splunk\\check-license.ps1 -servername $ARG1$ -port $ARG2$ -username $ARG3$ -password $ARG4$ -warn $ARG5$ -critical $ARG6$; exit($lastexitcode) | powershell.exe -command –

On the Nagios server: create a new command using NRPE

define command{
command_name nt_nrpe_splunklicense
command_line /usr/lib/nagios/plugins/check_nrpe -t 30 -H $ARG1$ -p 5666 -c check_splunklicense -a $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$
}

On the Nagios server: add a service to your host definition

define service{
use generic-service
host_name splunkindexer.bwlab.loc
service_description splunk license check splunk-2
check_command nt_nrpe_splunklicense!1.1.1.1!1.1.1.2!8089!admin!yourpassword!380!500
}

As you see at the command and service definition the first argument is the host where the Powershell script will be executed (1.1.1.1). The second and following arguments gives the Splunk indexer hostname (1.1.1.2) and credentials for login. The 380 and 500 pieces are the thresholds in MB for warning and critical triggers in Nagios.

Parameters

Here is a detailed list of the script parameters:

-servername

the servername or ip address to be checked – default localhost

-port

port of splunkd – default 8089

-protocol

protocol to use to communicate with splunkd – default: https

-timeout

connectiontimeout to splunkd in milliseconds – default 5000

-username

username to use to login to splunkd

-password

password to use with splunkd

-pool

licensepool to check – default “auto_generated_pool_download-trial” ..
freeversion is “auto_generated_pool_free”

 -warn

warningvalue in Megabytes

-critical

critical value in Megabytes

-showpool

display all pools found on the indexer and usage. Values could be 0 (default: don’t display) or 1 (display)

If you are unsure which license pool to use check the -showpool parameter. It will display all license pools on the indexer and the used bytes.

if everything is setup correctly you will be honored with great check for your licensing and will never miss a warning again.

automatic Rolling Pool Upgrade from XenServer 6.1 to XenServer 6.2

2015-02-17T00:00:00+00:00

Alright, XenServer 6.2 has been released a few days ago. There seem to be no well-known update issues – so time to get the job done…

Things to consider BEFORE upgrading

This is the first release which massively removed features from the product. If you are using:

Workload Balancing (WLB) or associated features (power down hosts…)
SCOM Integration
Backups using Virtual Machine Protection and Recovery (VMPR)
Web Self Service

. This update is not for you! Check out http://support.citrix.com/article/CTX137826 for further information. Second, you need a Windows Host for the Rolling Pool Upgrade. Install and update XenCenter before proceeding …

Prepare the installation media

The easiest way to provide the installation media to XenServer 6.1 is NFS. Here you do not have to care for IIS MIME Type issues or FTP ascii/binary stuff. Just click & serve: Add the NFS Server Role, create a folder and copy the content of the downloaded XenServer-6.2.0-install-cd.iso from Xenserver.org there and create a NFS share. The first step is to add the NFS server role

Create the NFS Share. Default permissions are fine, since they will give read access to every host.

To be on the safe side you should check if the NFS share can be accessed. Run XenServer Console or login via SSH to XenServer poolmaster and mount the NFS share like in my example:

mkdir /mnt/nfstest
mount -t nfs NFSServer:/NFSSHARENAME /mnt/nfstest
ls -l /mnt/nfstest
umount /mnt/nfstest

Rolling Pool Upgrade Wizard slideshow..

Now it’s time to start XenCenter and run the wizard:

run the wizard

Select your pool. Nothing can be done wrong here

Select “automatic” as we have prepared the NFS share

Check the prechecks there. As you see Workload Balancing (WLB) and WSS are two features which will be removed after the Upgrade.

Set XenServer installation media NFS Share

Get a coffee – you may check the XenServers console to see if it’s proceeding..

if you check the server’s console – cross your fingers and watch progress bars…

After some time XenCenter will show you update have been successfull.

After relogon to XenCenter you want to verify the Buildnumber

Fix RDP Error 3334 in mRemoteNG and others

2015-02-17T00:00:00+00:00

I’m using MRemoteNG to manage and create my RDP connections. In the last months, every now and then, I was unable to connect RDP sessions. When using mstsc.exe instead the RDP connection was established correctly.

After some research I found the reported mRemoteNG Bug - https://mremoteng.atlassian.net/browse/MR-582. In the Comments somebody suggested to add the /LARGEADDRESSAWARE flag to the mremoteng.exe binary. Tried it, fixed it!

If you haven’t Visual Studio installed, here is a link to a modified mremote.exe Version 1.72.

It seems that every comparable Software like ASG/Visionapp Remote Desktop, Microsoft Remote Desktop Conection Manger (RDCMan) had the same issue in older versions.

Fix GeoIP and Google Maps Apps in Splunk 6.1

2015-02-16T00:00:00+00:00

Today, after upgrading to Splunk 6.1 I realized, that some GeoIP data in dashboards was missing. By using the lookup search command to get the country from an IP address like :

| stats count | eval ip=”193.28.153.192″ | lookup geoip clientip as ip

I got an error message, which showed that the lookup was somehow not working.

As the “geoip” lookup is implemented as a python script I checked the process using procmon..

As we see python.exe – which represents the lookup script located at c:\Program Files\Splunk\etc\apps\MAXMIND\bin\geoip.py – tries to read the Maxmind Database File GeoCityLite.dat and fails because the file is not where expected. In fact the database file is located at app folder c:\Program Files\Splunk\etc\apps\maps\bin\GeoLiteCity.dat, not Program folder c:\Program Files\Splunk\bin\GeoLiteCity.dat.

To fix the issue open the lookup script, uncomment line 5 and comment out line 6:

DB_PATH = os.path.join(os.environ["SPLUNK_HOME"], ‘etc’, ‘apps’, ‘MAXMIND’,'bin’,'GeoLiteCity.dat’)
#DB_PATH=(‘GeoLiteCity.dat’)

The same issue also applies to the Splunk Google Maps app. The command

| stats count | eval ip=”193.28.153.192″| lookup geo ip

returns error code 1 instead of a pin on the map.

you have to adjust the config file c:\Program Files\Splunk\etc\apps\maps\default\geoip.conf to

database_file = c:\Program Files\Splunk\etc\apps\maps\bin\GeoLiteCity.dat

The whole issue looks like a compatibility issue from Splunk 6.0 to 6.1. It seems that lookup scripts are executed in a different working directory.

How to access the local Edgesight Firebird database

2015-02-16T00:00:00+00:00

There are certain troubleshooting scenarios where you want to get access to the data from the local Edgesight database. Technically the Edgesight agent collects all the data from the running local host in a local Firebird database and uploads the data in a configurable timeframe into the central MSSQL database.

So if you have e.g. data missing in the MSSQL database you might want to know if the data is not collected from the agent (you might want to update it) or if the consolidation from the Firebird database to the MSSQL databas is an issue.

Unfortunately the process to connect to the local database is not well documented, so I’m doing this here.

The steps necessary are to setup a second instance of Firebird, attach the local database and browse the database using a GUI tool.

First you need to download the proper version of Firebird database server from http://www.firebirdsql.org. Make sure you download the newest version of the same major release Citrix is using for the agent. To get the major release check the file version of “C:\Program Files (x86)\Citrix\System Monitoring\Agent\Core\Firebird\bin\fbserver.exe”.

Now install the Firebird server with a next->next->next->finish. You should use the “Classic server binary” version.

Open services.msc and search for the “firebird – DefaultInstance” service to make sure Firebird is up and running. Btw.: you will find the Citrix Edgesight Firebird Service next to it, named “Firebird Server – CSMInstance”.

As you want to work with the database using a GUI, I suggest you install Flamerobin from http://flamerobin.org/. Just install the application – “next->next->next->finish.”

Now it’s time to stop your Edgesight and Citrix Firebird services on the Edgesight device.

Fire up Flamerobin and connect to the local newly installed Firebird instance. Choose Server->Register New Server

The running server is localhost and the TCP Port 3050.

Now you need to attach the Citrix Firebird database to this server. Choose “Register existing database…” and select the “RSDATR.FDB” database file.

Make sure to use the user sysdba and password masterkey (this is the default password of the new Firebird instance) to login.

That’s it, you’re done and can now browse and query all the tables, views and triggers of the local database.

ErrorAction for Get-VMHostNetworkAdapter broken

2014-09-11T00:00:00+00:00

The ErrorAction parameter still throws an error even if the “SilentlyContinue” option is selected. This is a problem if you just want to check the existence of NICs on a ESX Host.

$ESXHost = Get-VMHost esx.fqdn -ErrorAction:Stop
$dvSwitch = Get-VDSwitch -Name dvSwitch1 -Location virtualDatacenter -VMHost $ESXHost -ErrorAction:Stop
$Portgroup = Get-VDPortgroup -Name “Management Network” -VDSwitch $dvSwitch -ErrorAction:Stop
Get-VMHostNetworkAdapter -VMHost $ESXHost -VMKernel:$true -VirtualSwitch $dvSwitch -PortGroup $Portgroup -ErrorAction:SilentlyContinue | out-null

If the Host has no vmk NIC configured on the “Management Network” portgroup of dvSwitch1 the CMDlet still throws a error message. My workaround is a empty try/catch function to suppress the error message.

try {
    Get-VMHostNetworkAdapter -VMHost $ESXHost -VMKernel:$true -VirtualSwitch $dvSwitch -PortGroup $Portgroup -ErrorAction:SilentlyContinue | out-null
}
catch {
    #silence
}

Also for this bug a VMware SR is open. Let’s see if this bug is fixed in the next PowerCLI version.

The VMware versions in my development environment.

PowerCLI Version
—————-
VMware vSphere PowerCLI 5.5 Release 2 Patch 1 build 1931983
—————
Snapin Versions
—————
VMWare AutoDeploy PowerCLI Component 5.5 build 1890764
VMWare ImageBuilder PowerCLI Component 5.5 build 1890764
VMware vCloud Director PowerCLI Component 5.5 build 1649227
VMware License PowerCLI Component 5.5 build 1265954
VMware VDS PowerCLI Component 5.5 build 1926677
VMware vSphere PowerCLI Component 5.5 Patch 1 build 1926677
VMware vSphere Update Manager PowerCLI 5.5 build 1302474
—————
vSphere Versions
—————
vCenter 5.5 1891313
ESX 5.5 1331820

Set-VDSwitch NumUplinkPorts Bug

2014-09-11T00:00:00+00:00

During my PowerCLI scripting for the provisioning of new ESX Hosts i have found another bug. The Set-VDSwitch CMDlet does not set the number of uplink ports correctly in every case.

You can change the number of uplink ports from 2 to 4 without problems, but the configuration back to two uplinks silently fails.

Set-VDSwitch -VDSwitch $dvSwitchObject -NumUplinkPorts “4″ -Confirm:$false

The dvSwitch has 4 uplink ports after the configuration change.

Set-VDSwitch -VDSwitch $dvSwitchObject -NumUplinkPorts “2″ -Confirm:$false

The dvSwitch still has 4 uplink ports after this configuration change. The CMDlet finishes without an error, but the configuration is unchanged. Currently i have no automation workaround for that, but i can change the settings in the vSphere Web Client manually.

I have opened a VMware SR. Let’s see if this bug is fixed in the next PowerCLI version.

The VMware versions in my development environment.

PowerCLI Version
—————-
VMware vSphere PowerCLI 5.5 Release 2 Patch 1 build 1931983
—————
Snapin Versions
—————
VMWare AutoDeploy PowerCLI Component 5.5 build 1890764
VMWare ImageBuilder PowerCLI Component 5.5 build 1890764
VMware vCloud Director PowerCLI Component 5.5 build 1649227
VMware License PowerCLI Component 5.5 build 1265954
VMware VDS PowerCLI Component 5.5 build 1926677
VMware vSphere PowerCLI Component 5.5 Patch 1 build 1926677
VMware vSphere Update Manager PowerCLI 5.5 build 1302474
—————
vSphere Versions
—————
vCenter 5.5 1891313
ESX 5.5 1331820

Set-VDUplinkTeamingPolicy has problems with the Failback option

2014-09-11T00:00:00+00:00

Time for the next PowerCLI bug. The Set-VDUplinkTeamingPolicy CMDlet has problems with the “failback” option.

It is possible to switch “Failback” from true to false, but not from false to true. The CMDlet does not throw an error. The setting just stays the same. With the vSphere Client i’m able to change the failback option in both directions.

$dvSwitch = Get-VDSwitch -Name dvswitch1
Portgroup = Get-VDPortgroup -Name “NFS” -VDSwitch $dvSwitch
$Portgroup | Get-VDUplinkTeamingPolicy | Set-VDUplinkTeamingPolicy -FailBack:$true

The CMDlet works as expected, because after this change the failback option is enabled.

$dvSwitch = Get-VDSwitch -Name dvswitch1
Portgroup = Get-VDPortgroup -Name “NFS” -VDSwitch $dvSwitch
$Portgroup | Get-VDUplinkTeamingPolicy | Set-VDUplinkTeamingPolicy -FailBack:$false

The failback option is still enabled and the CMDlet does not throw an error.

Currently i have no workaround for it, but also for this bug a VMware SR is open.

The VMware versions in my development environment.

PowerCLI Version
—————-
VMware vSphere PowerCLI 5.5 Release 2 Patch 1 build 1931983
—————
Snapin Versions
—————
VMWare AutoDeploy PowerCLI Component 5.5 build 1890764
VMWare ImageBuilder PowerCLI Component 5.5 build 1890764
VMware vCloud Director PowerCLI Component 5.5 build 1649227
VMware License PowerCLI Component 5.5 build 1265954
VMware VDS PowerCLI Component 5.5 build 1926677
VMware vSphere PowerCLI Component 5.5 Patch 1 build 1926677
VMware vSphere Update Manager PowerCLI 5.5 build 1302474
—————
vSphere Versions
—————
vCenter 5.5 1891313
ESX 5.5 1331820

Setup and configure Splunk Forwarder on Debian Linux

2014-04-08T00:00:00+00:00

Small and short article for today: how to setup and configure Splunk Forwarder for Debian Linux. This is also valid for all Debian derivates like Ubuntu etc. In the Splunk universe the Forwarder is installed on your machine, collects data and events generated by your OSs and forward it to your Splunk Indexer. In general you should install the Forwarder on all clients generating log files if you want to collect them on the Indexer.

First download the appropriate .deb package (32 bit or 64 bit) from from http://www.splunk.com/download/universalforwarder. Now you can create an unattended setup of Splunk Forwarder with a shell script like this (64 bit Forwarder is used).

#!/bin/bash

# install the package
dpkg -i splunkforwarder-5.0.1-143156-linux-2.6-amd64.deb

 

# accept EULA
/opt/splunkforwarder/bin/splunk start –answer-yes –no-prompt –accept-license

 

# change the adminpassword from changeme to Splunky
/opt/splunkforwarder/bin/splunk edit user admin -password Splunky -auth admin:changeme

 

# point the forwarder to forward all events to splunkserver
/opt/splunkforwarder/bin/splunk add forward-server splunkserver:9997 -auth admin:Splunky

 

# index and watch/monitor all files in /var/log
/opt/splunkforwarder/bin/splunk add monitor /var/log/ -auth admin:Splunky

Don’t forget to adjust the server name and port as well as the user and password to your Splunk Indexer installation.

In default installation Splunk forwarder is binding itself to all network interfaces (0.0.0.0). As this is not necessary and a security risk, you can reconfigure it in the file /opt/splunkforwarder/etc/splunk-launch.conf and add the following lines:

After this a restart of the Splunk daemon is necessary:

# bind splunk to localhost only
echo “# bind splunk to localhost only” >> /opt/splunkforwarder/etc/splunk-launch.conf
echo “SPLUNK_BINDIP=127.0.0.1″ >> /opt/splunkforwarder/etc/splunk-launch.conf

/opt/splunkforwarder/bin/splunk restart

Create the initscripts for startup

/opt/splunkforwarder/bin/splunk enable boot-start

NumberFormatExeption for certain vCOps 5.7.1 custom UI widgets

2013-10-10T00:00:00+00:00

Version 5.7.1 of vCenter Operations has a problem displaying Data in Dashboards if you install the standalone Analytics Server with non english language settings. To workaround the problem add “Duser.language=en” to the vcopsWebService configuration and restart the service.

screenshot	description
	Configure a generic Scoreboard with the option to round by two Decimals.
	The generic Scoreboard throws a NumberFormatExeption before the data is displayed.
	To workaround the error add `wrapper.java.additional.22 = -Duser.language=en` option to the Tomcat Wrapper config located in %ALIVE_BASE%/user/conf/tomcat/wrapper.conf.
	Restart the vcopsWebService to enable the new setting. `net stop vCOpsWebService && net start vCOpsWebService`
	Restart the vcopsWebService to enable the new setting. `net stop vCOpsWebService && net start vCOpsWebService`

Update: VMware has published the KB article 2058431 with a official description of the issue.

SQL query to filter active vCenter Faults in the vCOps database

2013-07-18T00:00:00+00:00

As described in SQL queries to filter vCOps alerts and events it is not possible to filter events and alarms by other things than the resource name column. It is also not possible to filter by alarm type or level.

With that SQL query you can workaround that problem.

use vcops

select dateadd(SECOND, convert(bigint, StartTimeUTC) / 1000, convert(datetime, '1-1-1970 02:00:00')) as Date, Name, MessageInfo
FROM Alarm INNER JOIN AliveResource ON Alarm.RESOURCE_ID = AliveResource.RESOURCE_ID
WHERE Alarm.CancelTimeUTC IS NOT null
AND Alarm.AlarmType = 12
AND Alarm.AlarmLevel = 2
AND AliveResource.RESKND_ID = 20
order by Date desc
;

	Number	Description
AlarmType	12	Fault
AlarmLevel	2	Warning
AlarmLevel	4	Critical
RESKND_ID	18*	vCenter*
RESKND_ID	20*	ESX Host*

Every vCenter Operations installation has unique Resource Kind IDs. You can see your IDs in the ResourceKind table.

Performance difference between GET-VM and GET-VIEW

2013-06-25T00:00:00+00:00

Today a colleague has asked me how to optimize a PowerCLI query for virtual machines in a certain virtual datacenter. He want’s to avoid a expensive GET-VM query and he seeks a solution to do that with get-view.

I checked my scripts to see how i solve this problem and i compared it with the default GET-VM query.

GET-VM

This is a short query with code easy to understand and maintain:

This is a short query with code easy to understand and maintain:

Get-Datacenter -Name “DCNAME” | Get-VM | Where-Object {$_.PowerState -eq “poweredOn”}

First the VIObject for the datacenter is queried, then the piped GET-VM query finds all VM’s inside that datacenter and then the list is filtered by VM’s that are powered on. If you provide the “-Name” option to GET-VM you can search for one or more virtual machines with certain names in that datacenter.

GET-VIEW

With this get-view query you get the same list of virtual machines.

get-view -ViewType VirtualMachine -Filter @{“Runtime.PowerState”=”poweredOn”;”Config.Template”=”false”} -SearchRoot $(get-view -ViewType Datacenter -Property Name -Filter @{“Name” = “^DCNAME$”}| select -ExpandProperty MoRef)

This code is harder to read and you must provide more informations to get the same result. Get-view is restricted to the viewtype virtualmachine, filters the result by the status “poweredon” and “template=false”. With the “-Searchroot” option and a second get-view query for the datacenter, the virtualmachine query is limited to this datacenter.

GET-VIEW with filter

Every time i use the get-view CMDlet i try to limit the number of returned properties with the “-Property” option.

get-view -ViewType VirtualMachine -Property Name,Summary -Filter @{“Runtime.PowerState”=”poweredOn”;”Config.Template”=”false”} -SearchRoot $(get-view -ViewType Datacenter -Property Name -Filter @{“Name” = “^DCNAME$”}| select -ExpandProperty MoRef)

In this case only the Name and the Summary section should be fetched. This makes the lists smaller and the script needs less memory and cpu cycles.

I have compared the runtime of the queries in one of the larger vCenter installations. It is interesting to see that GET-VM is faster than GET-VIEW without filters applied. VMware seems to have optimized the GET-VM CMDlet in the current PowerCLI version (5.1 Release 2 build 1012425).

	Get-VM	GET-VIEW	GET-VIEW with a property filter
# virtual Machines	1159	1159	1159
Runtime	23.08s	40.53s	6.29s
Runtime per VM	0,017s	0,035s	0,005s
Speed up		-56%	366%

1000 percent speed up for Get-vCOpsResourceMetric

2013-06-18T00:00:00+00:00

Currently VMware does not provide a PowerCLI snapin to access the vCenter Operations API (the HttpPostAdapter). Clint Kitson and Alan Renouf have developed a Powershell Module to fill that gap.

The Module provides the CMDlets Get-vCOpsResourceAttributes, Get-vCOpsDBQuery and Get-vCOpsResourceMetric. I use Get-vCOpsResourceMetric in a customer project to fetch the “Active Memory” of virtual machines to calculate new RAM reservation values. The Module has saved me a lot of development time, but the runtime is too high if you have more than a few VM’s.

The Get-vCOpsResourceMetric CMDlet uses several Arrays to process the metric data that has been fetched from vCenter Operations. I have replaced that arrays with LinkedLists, because if you have many entries in the arrays they are really slow in Powershell/.net. Since Luc Dekens has provided that hint in his PowerCLI session during VMWorld 2011, i use it in many scripts to raise the performance of my scripts. You can find some additional infos on James Brundage’s page start-automating.com. I also moved the creation of PSobject that contains the final data structure out of the inner foreach loop and i have removed the date conversion from the CMDlet. Especially the date conversion from Unix time stamp to a local date format is very time consuming. The PSoject still includes the timestamp and you can make the conversion in a later step if you need that for some metrics. You can download the updated version of the ps_vcops.psm1 Module. Maybe Clint and Alan can update the original version to include the changes and provide a option to enable the date conversion.

	original script	optimized script with date conversion	optimized script without date conversion
CMDlet	Get-vCOpsResourceMetric	Get-vCOpsResourceMetricoptimizeddateincluded	Get-vCOpsResourceMetricoptimized
Metric	mem\|active_average	mem\|active_average	mem\|active_average
Timeframe	180 days	180 days	180 days
# Values	30557	30555	30555
Runtime	0h 4m 54s	0h 3m 15s	0h 0m 28s
Runtime per Value	0.009631s	0.006371s	0.000916s
Speed up		151,16%	1051,41%

I have used that calls to run the CMDlets and they provide the data shown in the screenshots:

$VM | Get-vCOpsResourceMetric -metricKey “mem|active_average” -startDate (Get-Date).AddHours(-4320) -includeDt:$false -includeSmooth:$false

$VM | Get-vCOpsResourceMetricoptimizeddateincluded -metricKey “mem|active_average” -startDate (Get-Date).AddHours(-4320) -includeDt:$false -includeSmooth:$false

$VM | Get-vCOpsResourceMetricoptimized -metricKey “mem|active_average” -startDate (Get-Date).AddHours(-4320) -includeDt:$false -includeSmooth:$false

If you have more than a few VM’s in your environment or you want to use a big timeframe for your vCOps metrics this optimization gives you big time savings. For more information about the usage of the HttpPostAdapter open the documentation url [https://vcopshost/HttpPostAdapter] of your vCOps installation.

Thanks to Clint and Alan for developing that Module!

Sporadic network issues using Samsung Series 9 Ultrabook

2013-05-30T00:00:00+00:00

I’m writing this article as I know some colleagues of mine are working with the same Samsung Series 9 notebook like me – and I hope I can help not trapping into a pitfall.

The last days I was heavily using my WiFi. From now and then I had some network issues – but I suspected the WiFi network I was connected to – as we all know about the quality of hotel WiFis. Furthermore I never had WiFi issues before.

Today when I was sitting in an office with public WiFi – during the first working hour everything was running fine. Later I got massive packet loss over the WiFi. A ping to the router and google DNS was looking like this:

First I also suspected a WiFi issue, switched to my MiFi device and after 1 hour connection issues occurred as well.

The packet losses occurred very sporadic – sometimes after 5 minutes, sometimes it was running fine for half an hour or more. Disconnecting and reconnecting to the WiFi seemed to fix the issue for some time.

I verified that my notebook is really the culprit by running a ping application from my iPhone.

The Intel WiFi driver for Centrio Advanced-N 6235 looked pretty recent but I remembered I did an update using “Easy Software Manager” round about 3 weeks ago.. so I decided to roll back to the previously installed driver.

After Rolling back version 15.5.6.48 instantly fixed the network issues.

When searching for the driver version I found a thread at the Intel forums confirming that the network driver seems to have issues.

http://communities.intel.com/message/188548

As the Centrio Advanced-N 6235 seems to be used in multiple Samsung series other models seems to be effected as well. For now it seems not sure if the Problem is Windows8 only.

Conclusion: Intel seems not to be interested to fix the problem and Samsung is still(!) deploying a crappy driver with their Software.. be aware – make sure not to run the recent driver version!

Update: After I did further tests it seems that driver version 15.5.6.48 is quite stable but having latency issues. As I am sick&tired and need a working solution I decided to buy a 15EUR mini USB dongle. Bye Bye Intel WiFI!!!

Remote Desktop Client hanging at "Securing remote connection…"

2013-02-18T00:00:00+00:00

Today we were troubleshooting a strange behavior in a customer environment. When connecting to several target machines the Remote Desktop Client hung for about 30 seconds at “Securing remote connection”. The issue seemed to be RDP Version related, as connections to Windows XP/2003 machines were established fast – while Vista/2008 or higher showed the issue.

As 15-30 seconds are quite often TCP timeouts we did a network trace for further analysis. The analysis showed that while the RDP client hung at “Securing remote connection…”, it tried to access ctldl.windowsupdate.com.

Note – dear network admin: This is a classic example of bad network design. The client was located in an isolated network but was able to lookup public targets and tried to access one of them. Because your IP firewall is dropping packages instead of rejecting them, the client will never get a notification that a connection could not be established and instead wait until the timeout is reached.

So if your network does allow lookups to external resources, and there are NO good reasons to do so, make sure to reject connections – at least from your own network – instead of dropping them. For maximized security strictly disable external DNS in isolated networks to avoid DNS tunnel attacks.

Note –the response from the network admin will be: This is a classic example of blaming the network instead of the application. Make sure to configure your OS and application correctly to avoid unnecessary network connections, disable automatic updates and things will be fine.

Well, as only Vista and higher targets shows the hangs we suspected an Root CA updating issue, found http://support.microsoft.com/kb/2677070 and disabled network retrieval.

Updates disabled, timeouts prohibited, mission accomplished – go home early.

Create local Intellicache Storage Repository after Installation

2013-01-23T00:00:00+00:00

One thing that is really missing in XenCenter is the possibility to create local storage repositories. So if you missed to create one during installation, or just want to add more local hdd capacity you have to use the commandline. Come one Citrix – this sucks! – we should have a GUI for this in a Enterprise Product, right?!

I recently added a local SSD to my XenServer to check out Intellicache. The SSD was added in addition to the local HDD so it became device /dev/sdb. First I had to remove all existing partitions using fdisk.

[root@xenserver01 ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 19457.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sdb: 256.0 GB, 256060514304 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 102400 7 HPFS/NTFS
Partition 1 does not end on cylinder boundary.
/dev/sdb2 13 19458 156185600 7 HPFS/NTFS

Command (m for help): d
Partition number (1-4): 1

Command (m for help): d
Selected partition 2

Command (m for help): p

Disk /dev/sdb: 256.0 GB, 256060514304 bytes
255 heads, 63 sectors/track, 31130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

After this i checked the SSD for partitions using “p”, found /dev/sdb1 /dev/sdb2 and deleted them using “d” 1,2. I verified that both partitions are deleted using “p” again. Next step is to create a default linux partition with max size (you have to use EXT3 filesystem as local SR for Intellicache) using “n”. Finally I had to write the partition table back to disk and quit using “w” and “q”.

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-31130, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-31130, default 31130):
Using default value 31130

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Now I had to introduce the newly created SSD partition (/dev/sdb1) to XenServer using the xe command.

xe sr-create host=xenserver01 content-type=user type=ext device-config:device=/dev/sdb1 shared=false name-label="localssd"

I got back the UUID of the newly created local SR and it occured XenCenter. In this example the UUID is “ad918078-aca8-2a76-81fd-7bbc4b2ba462″

Next I had to configure the Storage Repository as Intellicache. Before I disabled (enable maintainance mode) XenServer.

[root@xenserver01 ~]# xe host-disable host=xenserver01
[root@xenserver01 ~]# xe host-disable-local-storage-caching host=xenserver01

[root@xenserver01 ~]# xe host-enable-local-storage-caching host=xenserver01 sr-uuid=ad918078-aca8-2a76-81fd-7bbc4b2ba462

This did the job – now I could verify the configuration by listing the host properties using “xe host-param-list”. This showed the configuration of the local SR configured for Intellicache “local-cache-sr”. To double-check if Intellicache functionality is enabled on the SR “xe sr-param-list” could be used. Check for the correct value of “local-cache-enabled value”.

[root@xenserver01 ~]# xe host-param-list uuid=c0251040-2ca1-4052-8f2e-aa764ce827e3
local-cache-sr ( RO): ad918078-aca8-2a76-81fd-7bbc4b2ba462
[root@xenserver01 ~]# xe sr-param-list uuid=ad918078-aca8-2a76-81fd-7bbc4b2ba462 | grep -i local-cache-enabled
local-cache-enabled ( RO): true

monitor co-stop with vCenter Operations

2012-10-04T00:00:00+00:00

With ESXTOP or the vCenter performance tab of a VM you can monitor the co-stop metric to check for vSMP performance problems. vCOps can show you that information for all the VMs on a host, a resource pool, a cluster, a vCenter or your complete environment. A single heatmap leads you to the part of your virtual infrastructure that has performance problems, because of there are too many virtual machines with too many vCPUs are deployed. Andy Fox has a good explanation of the co-stop metric and in the VMware communities you can find the excellent document Interpreting esxtop Statistics. The Determining if multiple virtual CPUs are causing performance issues Knowledge Base article indicates that values above 100 are a problem and that you should lower the number of vCPUs for this VM to reduce the co-stop delays.

Just create a new super metric with the formula “co-stop/provisioned CPU cores/20000*100″ and add it to a super metric package as described in my Using Super Metrics to monitor CPU %READY Part 2 article.

Create a new dashboard, add the heatmap and metric graph widget to it and configure both. Details on these steps you can find here.

The heatmap shows you all the virtual machines with co-stop problems at a glance.

scriptbased XenServer guest OS customization

2012-09-24T00:00:00+00:00

One feature that is really missing at Citrix XenServer compared to VMware vSphere is OS Customization. My dear co-blogger Birk (the VMware guy) will never stop to underline this. At vSphere settings like Hostname, Network configuration, Domain membership, Licensing can be defined in specifications and applied to templates and VM images.

How VMware does it, how we workaround with XenServer

VMware vSphere can inject a command from hypervisor into the VM. VMware Tools needs to be started for that.

So at vSphere they are cloning the VM, create a VM-specific sysprep file from specification and inject that file with help of VMware Tools. After that they start the sysprep process and reboot. The “invoke-vmscript” cmdlet from powercli is the powershell implementation of passing the command to the VM.

Easy, simple, reliable but useless for XenServer as there is no possibility to pass a command from the hypervisor to a VM. But even though there is no possibility to prepare and start a sysprep from the hypervisor we can trigger the process from “outside” – as long as there is a network connection available.

Planning the implementation.

Here is our workaround: we generate and trigger sysprep from a “helper machine”. This host needs to have direct network access and credentials to XenServer and to the target VM.

I created a PowerShell script which handles the sysprep from your helper machine. In detail it does the following steps:

read XenServer configuration from a XML config file (url, username, passwort)
connect to XenServer or Poolmaster
read configuration for the VM you want to customize from XML file (template, network, hostname..)
create a clone from template, connect to the correct network
power up new created VM and wait for VM to be started
the VM is assumed running when XenServer tools provide a valid IP address and we are able to connect to specified TCP port (configured in vm configfile). For most scenarios tcp/445 (cifs) or TCP 3389 are common tcp ports to check against.
connecting with specified credentials configured in the VM config file
read custom sysprep answer file configured in the VM config file and transform it in a sysprep file for the target VM
copy the VM specific sysprep.xml file at c:\sysprep\ on the target VM
set the autologin registry key and make sure sysprep is started at next login.
restart the target VM

That’s it: your target VM will reboot now, will autologin with the credentials you supplied and start sysprep. Sysprep will trigger the next reboot and customize everything you prepared.

** Setup example **

Choose a windows machine running at least powershell 2 (could be your workstation). In addition you’ll need to have the XenServer PowerShell CmdLet installed.

Create a script directory (c:\osc in this scenario) and extract the provided files to it.
Adjust hypervisor-sample.xml to your environment: choose poolmaster in “url” parameter if you are running a pool and save it as hypervisor.xml.
Now create a sysprep template file using a Windows WAIK (www.microsoft.com/de-ch/download/details.aspx?id=10333) or adjust my sample-sysprep-domainjoin.xml. Make sure there is the line “REPLACECOMPUTERNAME” in your sysprep.xml – the script is doing a simple string replacement for your hostname matching REPLACECOMPUTERNAME. There is still some room for improvements in the scripts.
Create the target VM config file by renaming and adjusting examplevm01-sample.xml. Make sure you have updated the path to your sysprep file.
Copy the x64 version of reg.exe to C:\osc – no joke! :) – the only reliable way to manipulate remote registry was reg.exe. As powershell binary needs to be started as x86 version a start-process would always start the x86 version and just update the HKLM\SOFTWARE\Wow6432Node registry key. As a workaround we use reg.exe in the C:\osc directory.

Now you can just start you powershell x86 and run the osc.ps1 passing parameters hypervisor.xml and examplevm01.xml to it.

C:\Windows\SysWOW64\WINDOWSPOWERSHELL\V1.0\powershell.exe -file .\osc.ps1 -hypervisorconfig xenserver01.xml -clonevm example01.xml

Summary

As you see it’s possible to add OS customizing functionality with XenServer – even if it’s some work. You’ll need such functionality if you want to create an automated VM deployment.

The mentioned osc.ps1 and configurationfiles are available for download.

Netscaler / AGEE SSL VPN connection issue Windows8

2012-09-13T00:00:00+00:00

Some days ago, I had a small issue with Citrix Netscaler / Access Gateway Enterprise when I tried to establish a SSL VPN connection.

First, the webpage didn’t offer the Access gateway Client for download and installation when I logged in. Strange, but I simply copied the installer file from my Netscaler’s /var/netscaler/gui/vpns/scripts/vista/nsvpnc_setup64.exe to my client and installed it manually. However, after a relogin to Netscaler and starting the VPN, my client tried to start the Java VPN client (fallback scenario).

To work around this issue just set the user agent of you Internet Explorer lower than IE10. Start developer tools with F12 and select Tools->set user agent string-> Internet Explorer 9.

Although Citrix updates Netscaler releases frequently, it seems nobody participated at the Windows8 betas. /* no comment */

SQL queries to filter vCOps alerts and events

2012-09-12T00:00:00+00:00

In the current version of vCenter Operations it is not possible to filter events and alarms by the info column, but only by the resource name column. With SQL queries you can workaround that limitation. Because of that limitation you can only filter by a hostname or datastore name, but you cannot search for certain events like “connection loss to server…”.

In the Alerts Overview you can browse through the list of the generated alerts. In the tree on the left side you can filter by resources like ESX hosts or datastores. On the top left side you have the search form. This search only filters by the “resource name” column, but not by the info column. In this example it means you cannot filter by “Resource is down”.

You can run sql queries against the vcops tables. In this example i use a MS SQL server. The other by vCOps supported database is Oracle. The select queries alerts that match the shown Alert.Info text entries. The results can then be exported into a csv file for further reports.

use vcops

select dateadd(SECOND, convert(bigint, StartTimeUTC) / 1000, convert(datetime, '1-1-1970 02:00:00')) as Date, Name, Info
FROM Alert INNER JOIN AliveResource ON Alert.RESOURCE_ID = AliveResource.RESOURCE_ID
WHERE Alert.Info LIKE 'Lost Connection to NFS server%'
OR    Alert.Info LIKE 'A possible host failure has been detected by HA on host%'
order by Date desc
;

The second example queries alarms (events) that match the shown Alarm.MessageInfo ‘Connection failed for %’ text entries and the AliveResource.RESKND_ID 20. This number is the internal ID for the resource ESX Host.

use vcops

select dateadd(SECOND, convert(bigint, StartTimeUTC) / 1000, convert(datetime, '1-1-1970 02:00:00')) as Date, Name, MessageInfo
FROM Alarm INNER JOIN AliveResource ON Alarm.RESOURCE_ID = AliveResource.RESOURCE_ID
WHERE Alarm.MessageInfo LIKE 'Connection failed for %'
and AliveResource.RESKND_ID = 20
order by Date desc
;

I have raised a vCOps feature request for filtering by info column. Until VMware provides that feature you can use that queries as a workaround.

a custom HTTP port causes service health problems in vCenter 5.0

2012-09-07T00:00:00+00:00

During the vCenter installation process you can config a custom http port. This can be needed because of security policies or if a other service uses port 80, but it leads to errors afterwards. The vCenter service health status will show errors for the Storage Monitoring Service and the Storage Policy Service. I have expierenced that error during the Proof of Concept for the upgrade from vSphere 4.1 to 5.0 in the last year. I could solve that problem with the VMware Support and meanwhile they have published the Knowledge Base article Installing vCenter Server with custom HTTP ports… The reason for the problem is that the installer does not add the custom port to the configuration files of SMS and SPS. I show you how the errors look like and how to solve the problem.

The vCenter installer provides the option to change the connection ports of the vCenter services. In my example i have changed the http port to 82. You will receive a warning if you do this, but later the installer forgets it’s own warning.

After the installation you will SMS and SPS service health errors in the vCenter Service Status window.

Go into the SMS configuration folder to add the port to the url. In a default installation it is

C:\Program Files\VMware\Infrastructure\VirtualCenter Server\extensions\com.vmware.vim.sms

Open the extension.xml with an editor and add the custom port to the SMS service health url.

http://localhost/sms/health.xml to http://localhost:82/sms/health.xml

Go into the SPS configuration folder to add the port to the url. In a default installation it is

C:\Program Files\VMware\Infrastructure\VirtualCenter Server\extensions\com.vmware.vim.sps

Open the extension.xml with an editor and add the custom port to the SPS service health url.

http://localhost/sps/health.xml to http://localhost:82/sps/health.xml

Restart the vCenter services or reboot the vCenter machine.

net stop "VMware vSphere Profile-Driven Storage Service"
net stop "vCenter Inventory Service"
net stop "VMware VirtualCenter Management Webservices"
net stop "VMware VirtualCenter Server"
net stop "VMwareVCMSDS"

net start "VMwareVCMSDS"
net start "VMware VirtualCenter Server"
net start "VMware VirtualCenter Management Webservices"
net start "vCenter Inventory Service"
net start "VMware vSphere Profile-Driven Storage Service"

Check the vCenter Service Status window for the solved errors. All checks now should be green.

Let’s see if this problem is fixed in vSphere 5.1. I will do another test if 5.1 has been released…

2012/09/11 Update:

Today i have tested the vSphere 5.1 release to see if the problem is solved. As the screenshot shows. In a new installation of vSphere 5.1 the service status problems with a custom http port are gone.

http://localhost:8080/sms/health.xml and http://localhost:21200/sps/health.xml

are now configured by the installer.

2014/09/24 Update:

The Port bug is back for the SMS component in vCenter 5.5 U2. Maybe other versions between 5.1 and 5.5 U2 are also affected.

CPU wait time in percent with a vCOps super metric

2012-09-07T00:00:00+00:00

In the vSphere client performance tab of a virtual machine you can monitor the CPU Wait time, but it’s counted for every vCPU and no percentage metric is provided. vCenter Operations can solve that problem for you. With a super metric you can calculate how many percent of the time all vCPUs wait for something. vCenter 5.0 also provides the Idle and the IO-Wait time to show you what is the cause for the CPU wait time, but with vSphere 4.1 these metrics are only available in ESXTOP. DOC-9279 has a lot of background information about the ESXTOP and vSphere client performance tab metrics.

The vSphere client performance tab shows the CPU wait time in ms for all vCPUs of a virtual machine. vSphere 5 also shows you the sum of all vCPU wait times, but you are on your own to calculate how many percent of the time the vCPUs have been waiting.

With this formula vCOps takes the CPU Wait time metric divides it by the number of vCPUs, divides that by 20.000ms and by 100 to give you the percentage value.

(((sum($This:A518)/sum($This:A521))/20000)*100)

After a few minutes the metric graph displays the first percent values. Like every other metric you can use also this one in Heatmaps or other widgets, you can define KPI’s, hard thresholds and alerts for it.

In my post Using Super Metrics… you can find some background on how to configure and use super metrics.

Splunk behind a Apache reverse proxy

2012-08-27T00:00:00+00:00

In the last weeks Andreas has tested Splunk in our lab environment. To access the Frontend from outside the lab, i have configured Apache to work as a reverse proxy for Splunk.

Apache configuration


 
  ProxyRequests                 Off
  SSLProxyEngine                On
  ProxyPreserveHost             On
 
 
  
    ProxyPass                   https://1.2.3.4:8000/splunk retry=0
    ProxyPassReverse            retry=0

additional Documentation: mod_proxy – Apache HTTP Server
Splunk configuration in Splunk\etc\system\local\web.conf

[settings]
 enableSplunkWebSSL = 1
 root_endpoint = /splunk
 tools.proxy.on = True

additional Documentation

This setup is not perfect, because it needs SSL to be enabled on the Splunk web frontend. Maybe it is possible to use mod_rewrite to change the URLs between https and http.

As a conclusion some Splunk screenshots. I’m sure Andreas will come up with a few in depth posts about Splunk and monitoring XenApp environments. Collecting log data from vCenter and the ESX hosts is another great use case for Splunk…

automatic backup of USB flash drives with a folder action

2012-08-23T00:00:00+00:00

I use USB flash drives to store all the little tools, scripts and config files i need to switch between Mac, Linux and Windows. A Macbook pro is my main notebook and because of that i use it to make regular automated sync’s of my USB flash drives to the local disk. I use the automator and a bash script that starts rsync to do a incremental backup of the files on the USB drive.

Start the Mac OS Automator to create a new folder action. A folder action we be started if something changes inside a folder. If you plug in a USB flash drive, a folder with the Name of the drive will be created under /Volumes and the drive will be mounted under this directory. This triggers the folder action we will create now

Choose to create a new folder action in the new document dialog.

Open the folder dialog to select the /Volumes folder as the trigger for the folder action.

Because /Volumes is not visible in the finder and in the file dialogs you must use Shift + Command + G to open the go to folder dialog.

Now the Volumes folder shows up in the file dialog.

Input the bash script that will run rsync if the name of the USB flash drive matches on of the names in the USBNAMES array.

If you use Mountain Lion you can add a automator action for the Notification Center after the bash script that will run rsync if the name of the USB flash drive matches on of the names in the USBNAMES array. The automator action can be downloaded at [automatedworkflows.com](http://automatedworkflows.com.

USBNAMES=( BBO-4GB BBO-8GB )
RSYNC=/usr/bin/rsync
RSYNCOPT=-avh
BACKUPFOLDER=~/USBBACKUP
MOUNTFOLDER=/Volumes 
MOUNTS=( $MOUNTFOLDER/* )

## create backup folder if missing
if [ ! -d $BACKUPFOLDER ]
   then
        mkdir -p $BACKUPFOLDER
fi 

for folder in $MOUNTS
do
	for name in "${USBNAMES[@]}"
	do
		if [ $folder == "$MOUNTFOLDER/$name" ]
		then
			$RSYNC $RSYNCOPT $folder $BACKUPFOLDER --log-file=$BACKUPFOLDER/$name.log
		fi
	done
done

Save the folder action and give it a meaningful name.

If you connect the USB flash drive the workflow icon will show up in the status bar and it will disappear if the script has been finished.

After the workflow has finished you should see the new USBBACKUP folder in your home directory, the backup folder for the flash drive and the rsync logfile for that drive.

It is also possible to use Growl notifications to report the status of the workflow, but i used the Mountain Lion notifications after i have upgraded my macbook.

useful widgets to get fast answers in vCenter Operations

2012-08-23T00:00:00+00:00

vCenter Operations provides different widgets to display the data collected from vCenter, storage systems or other data sources. Using the example of finding out which of the VMs in a vCenter are affected by high cpu ready % values i show you how to use heatmaps, metric graphs and data distribution widgets.

Heatmaps

Heatmaps can display the current status of many objects (even thousands) in one clear view. It gives you a quick answer for which datastores have a high latency, which cluster have the most ballooning virtual machines, which ESX hosts are currently not available in which cluster and so on. This screenshot displays a heatmap configuration that shows all virtual machines in a vCenter, how many vCPUs they have configured (the size of the square) and how high their cpu ready % values are. If the values are higher than 5% the color changes from green to red (10% or more). As i have described in part 3 of “using super metrics…” you can create interactions between the widgets to “handover” the a metric from the selected object to another widget. If you click on a VM in the heatmap the “ready %” metric will be drawn in the metric graph and also in the data distribution widget if you configure both.

Metric graphs

Metric graphs draw a metric for one or more objects over a configured timeframe. They are good to understand the history and development of an object. For instance how high was the cpu usage of a ESX host or cluster a week ago, how fast does a datastore fill up with vmdk’s, how many lost packets had that network interface yesterday and so on. I have selected two VMs with different % ready values from the heatmap. In the metric graph i have combined both metrics, but it also possible to draw multiple graphs in one widget. This is useful if you have many metrics to compare or if you would zoom in a single graph. The screenshot shows a different history for the both virtual machines. The blue VM has very high values since august the ninth. Both run in the same vCenter, but maybe on different ESX hosts. The graph gives you an indication of what could be the problem. You can drill down into the virtual machine to see on which host it runs, check if the host has problems and if other VMs on that host are also affected.

Data distribution widgets

Data distribution widgets are more complex and harder to read, but they display how the values are distributed even over two timeframes. With them you can answer what was the cpu usage for a virtual machine most of the time in the last seven days and also for the last 30 days. The same is possible for IOps or latency of a datastore or the availability of a ESX host. Because i have selected the two virtual machines in the heatmap and i also have configured interactions for metric graphs and data distribution widgets, they will show up also in this widget. The screenshot shows the graph for both VMs and the configuration for seven and 30 days. The virtual machine on top has most of the ready % values between 20-25% in the last seven days and between 0-5% plus 20-25% in the last 30 days. This virtual machine has more problems in the last seven days then in the weeks before. If you compare it with metric graph you will see that this is the blue VM. The other VM has slightly better values in the last seven days than in the last 30 days and in that case it is more visible than in the metric graph. The Y axis in the graphs shows the percentage for the distribution of the values. For the top VM it means nearly 50% of the time the ready values in the last seven days were around 23.5% (the blue spike).

I think the vCenter Operations widgets prove that data visualization is very important and helpful to give educated answers about the health of your virtual infrastructure. With the integration of more data sources and automatic relationship configuration vCOps becomes even more valuable.

Sequencing vSphere Client V5 ends up with V4.1

2012-08-20T00:00:00+00:00

Recently I had to sequence the vSphere Client V5 Update 1 Client with App-V. What normally sounds like an easy task ended up with some trouble…

To create an unattended installation job just download the viclient.exe from the vCenter Server (https://vcenterserver) and start the binary. This extracts all MSI sources to your %TEMP% directory. Now you can copy this directory and create an installation batch which is executing an msiexec:

set src=%~dp0
start /wait msiexec /i “%src%extract\VMware vSphere Client 5.0.msi” /l*v “%temp%\VMware vSphere Client 5.0.log” /qb-

When you use this installation-batch for testing purposes on a local machine, everything runs smoothly and you get a clean installation. In the next step I tried to sequence this installation-batch to get an App-V package. To sequence you have to start the App-V Sequener gui and create a new package with a custom installation path or run the sequencer (like me) from the commandline:

“C:\Program Files (x86)\Microsoft Application Virtualization Sequencer x64\SFTSequencer.com” /PACKAGENAME:”vsphere-client5″ /INSTALLPACKAGE:”C:\temp\vsphere-client5\install.cmd” /INSTALLPATH:”Q:\vsphere-client5″ /OUTPUTFILE:”C:\app-v\vsphere-client5\vsphere-client5.sprj”

The install.cmd is the installation-batch mentioned before. When you now start the vSphere Client from the package and connect to a vSphere5 Server it tells you to update your vSphere Client. Remember: the same installation-batch without App-V was running fine. When I checked the startup-link for the client it looked like this:

As you see the vSphere Client is mentioned as version 4.1*, but comparing this to the local installation the correct version of “C:\Program Files (x86)\VMware\Infrastructure\Virtual Infrastructure Client\Launcher\vpxclient.exe” should be something like 5.0.*.

This proves that something is going wrong during the sequencing process. After some research on the web I found a Technet Discussion which seems to pinpoint the root cause: the vSphere Client installer is trying to create a proprietary USB Service and start it. This service runs natively on a system, but not within the App-V sequencing sandbox. The installer seems to check if the service has been started successfully and otherwise falls back to an old version.

As descripted in the discussion I created an MST and adjust my installationbatch:

start /wait msiexec /i “%src%extract\VMware vSphere Client 5.0.msi” TRANSFORMS=”%src%extract\without-usb.mst” /l*v “%temp%\VMware vSphere Client 5.0.log” /qb-

This will disable the USB service during the setup and ends up with the correct link:

You could create the mentioned MST yourself or download it here.

which seems to

pinpoint the cause.

Troubleshooting XenServer VM network performance

2012-08-20T00:00:00+00:00

Recently I had to troubleshoot network performance issues for VMs running on Citrix XenServer 6.0.2. All VMs were running W2k8R2 and it seems that CIFS copy jobs where incredible slow. The network itself couldn’t be the problem as all VMs were connected to the same virtual network on only one XenServer.

My favorite troubleshooting tool for these scenarios is netio123. It gives you a basic idea how much throughput you could get between two endpoints. It implements a server and a client in the same binary and is available for multiple OSs. Small, easy and straightforward…

In the first run I tested 2 windows VMs. The command “netio -t -p 5000 -s” will start the netio server on one VM on TCP/5000. To start the test from the client run “netio.exe -t -p 5000 ”.

C:\temp\netio123\bin>netio.exe -t -p 5000 192.168.15.73

NETIO – Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel
TCP connection established.
Packet size  1k bytes:  8419 KByte/s Tx,  9423 KByte/s Rx.
Packet size  2k bytes:  8236 KByte/s Tx,  8851 KByte/s Rx.
Packet size  4k bytes:  16263 KByte/s Tx,  18310 KByte/s Rx.
Packet size  8k bytes:  32720 KByte/s Tx,  33743 KByte/s Rx.
Packet size 16k bytes:  63435 KByte/s Tx,  65853 KByte/s Rx.
Packet size 32k bytes:  116217 KByte/s Tx,  121351 KByte/s Rx.

As you see by the results i got less than 10MByte/s for small packages. Seems quite slow for a virtual network on one host. I fired up 2 LinuxVMs and rerun the test:

root@squeeze2:~/netio123/bin# ./linux-i386 -t -p 5000 192.168.15.78

NETIO – Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel

TCP connection established.
Packet size  1k bytes:  331380 KByte/s Tx,  337781 KByte/s Rx.
Packet size  2k bytes:  352727 KByte/s Tx,  344394 KByte/s Rx.
Packet size  4k bytes:  324983 KByte/s Tx,  325345 KByte/s Rx.
Packet size  8k bytes:  332496 KByte/s Tx,  328502 KByte/s Rx.
Packet size 16k bytes:  348690 KByte/s Tx,  357080 KByte/s Rx.
Packet size 32k bytes:  369076 KByte/s Tx,  356453 KByte/s Rx.

I constantly got 300MByte/s even for small packages. This seems to be fine. So what happens when i run the test with a mixed Windows/Linux environment?

C:\temp\netio123\bin>netio.exe -t -p 5000 192.168.15.78

NETIO – Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel

TCP connection established.
Packet size  1k bytes:  108210 KByte/s Tx,  170729 KByte/s Rx.
Packet size  2k bytes:  129609 KByte/s Tx,  173783 KByte/s Rx.
Packet size  4k bytes:  219375 KByte/s Tx,  202754 KByte/s Rx.
Packet size  8k bytes:  280427 KByte/s Tx,  205357 KByte/s Rx.
Packet size 16k bytes:  283376 KByte/s Tx,  206990 KByte/s Rx.
Packet size 32k bytes:  239445 KByte/s Tx,  206456 KByte/s Rx.

Wow, impressive 100MByte/s even for small packages. That’s 10x compared to WinVM-WinVM. Now it’s quite sure that the network performance issues are just related to the Windows VMs itself – and only when connecting to other Windows VMs. So Windows implements enhanced TCP features which you can display with the netsh command (e.g.: “netsh int tcp show global”). One basic troubleshooting step is to disable all these features and rerun the test. For disabling certain features run the following commands: “netsh int tcp set global chimney=disabled” for disable chimney, “netsh int tcp set global rss=disabled” for disable receive seite scaling… The breakthrough in my case was to disable autotuning by

netsh interface tcp set global autotuninglevel=disabled

C:\temp\netio123\bin>netio.exe -t -p 5000 192.168.15.73

NETIO – Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel

TCP connection established.
Packet size  1k bytes:  94664 KByte/s Tx,  98326 KByte/s Rx.
Packet size  2k bytes:  99216 KByte/s Tx,  102267 KByte/s Rx.
Packet size  4k bytes:  99947 KByte/s Tx,  105185 KByte/s Rx.
Packet size  8k bytes:  98054 KByte/s Tx,  99594 KByte/s Rx.
Packet size 16k bytes:  99172 KByte/s Tx,  103267 KByte/s Rx.
Packet size 32k bytes:  98967 KByte/s Tx,  102743 KByte/s Rx.

This shows a 10x performance boost for small packages immediately. The sluggish copy jobs are now performing well. So, always make sure your VM network is running exactly. The small tool netio123 (you’ll easily find this on the web) can help you with some basic tests.

So, always make sure your VM network is running fine. The small tools netio123 (you’ll easiely find this on the web) could help you with some basic tests.

XenCenter is syncing endless with XenServer

2012-08-20T00:00:00+00:00

Some days ago I ran into the problem that XenCenter was not connecting to my XenServer correctly. The initial connect seems to be fine, but it got stuck at “synchronizing with …”.

Further investigation showed that the Server was accesible at TCP/443 and the client was connecting correctly, no hints in XenCenter or XenServer logs. I encountered the same issues with the XAPI Tool xe.exe and powershell commandlets.

I found it even more interesting that I could connect with XenCenter from other clients. The clients causing problems were VMs running inside the XenServer itself, while the other clients – as my local workstation – were connecting without complications. When starting to solve the problem at VM level, I noticed that the problem just starts after XenServer tools were installed.

I had a look in the vNIC driver settings (XenServer tools are updating these drivers at setup) and disabled Large Receive Offload (IPv4) setting. Et Voilà – issue solved!

This issue is related to XenServer Tools 6.0.2.

XenServer powershell pssnapins not loading correctly

2012-08-20T00:00:00+00:00

The other day I encountered a problem starting the XenServer PSSnapins.

The powershell error message told me “the Windows Powershell snap-in ‘XenServerPSSnapIn’ is not installed on this computer”

The explanation for this is quite easy: the XenServer powershell cmdlet’s are not(!) available for x64 platform. The batch file starting the XenServer Powershell Snapin starts powershell.exe using the C:\windows\system32… directory. Although this is valid on x86 platforms the directory has to be fixed on 64Bit platforms.

Simply Change C:\windows\system32\windowspowershell.. to C:\windows\SysWow64\windowspowershel… in “C:\Program Files (x86)\Citrix\XenServerPSSnapIn\XenServerPSSnapIn.bat”.

Hopefully Citrix will adjust the installer to check for the correct platform in future releases.

Get-DistributedSwitchPortGroup Name and Datacenter problems

2012-08-10T00:00:00+00:00

Earlier this year i have found two errors within the Get-DistributedSwitchPortGroup cmdlet from the book “The VMware vSphere PowerCLI Reference: Automating vSphere Administration”. The book is a great resource for automating vSphere environments and the scripts extend PowerCLI with the missing dvSwitch cmdlets and other useful stuff. Thanks for the work!

Table of Contents

Query portgroups on multiple dvSwitches
Query portgroups with similar names

1. Query portgroups on multiple dvSwitches and multiple datacenter

If you have configured more than one datacenter within a vCenter and you also have configured the portgroups with the same name on dvSwitches the cmdlet will return both portgroups. The reason is that the cmdlet has no option to filter by dvSwitch.

original code

Function Get-DistributedSwitchPortGroup
{
    <#
    .SYNOPSIS
        Get Distributed Virtual Port Groups (DVPG) by name or vDS.
    .DESCRIPTION
        Get Distributed Virtual Port Groups (DVPG) by name or vDS.
    .PARAMETER Name
        Name of the DVPG to retrieve supports wildcards.
    .PARAMETER DistributedSwitch
        Name of vDS to retrive the DVPG for.
    .EXAMPLE
        Get-DistributedSwitchPortGroup -Name PG02
    .EXAMPLE
        Get-DistributedSwitchPortGroup -DistributedSwitch vDS01
    #>
    [CmdletBinding()]
    param(
        [Parameter(Mandatory=$false
        ,   ValueFromPipelineByPropertyName=$true
        ,   ValueFromPipeline=$true)]
        [String]
        $NAME
    ,   [Parameter(Mandatory=$false
        ,   ValueFromPipelineByPropertyName=$true)]
        [String]
        $DistributedSwitch
    )
    Begin
    {
        $extraparams=@{}
        $extraparams["Property"] = @(
             'Name'
        ,    'Config.Description'
        ,    'Config.Type'
        ,    'Config.DefaultPortConfig'
        ,    'Config.DistributedVirtualSwitch'
        ,    'VM'
        ,    'PortKeys'
        ,    'AlarmActionsEnabled'
        )
        IF ($DistributedSwitch)
        {
            $vDSMoRef = Get-view -Property 'moref' `
                -ViewType "VmwareDistributedVirtualSwitch" `
                -filter @{'Name'=$DistributedSwitch}`
                -verbose:$false | 
                Select-Object -ExpandProperty MoRef
                Select-Object -ExpandProperty Value
            If ($Name)
            {
                $extraparams["filter"] = @{
                    'Name'=$Name
                    'Config.DistributedVirtualSwitch'="VmwareDistributedVirtualSwitch-$($vDSMoRef)"
                }
            }
            Else
            {
                $extraparams["filter"] = @{
                    'Config.DistributedVirtualSwitch'="VmwareDistributedVirtualSwitch-$($vDSMoRef)"
                    }
            }
        }
        If ($Name)
        {
            $extraparams["filter"] = @{'Name'=$Name}
        }
    }
    Process
    {
        get-view -ViewType  "DistributedVirtualPortgroup" -verbose:$false @extraparams |
            Select-Object @{
                Name='Name'
                Expression={$_.Name}
            },
            @{
                Name='Description'
                Expression={$_.Config.Description}
            },
            @{
                Name='PortBinding'
                Expression={$_.Config.Type}
            },
            @{
                Name='VLANID'
                Expression={(($_.Config.DefaultPortConfig.Vlan.VlanId|%{
                        if ($_ -match "\d+") {$_}
                        elseIf ($_.Start -eq $_.End) {$_.Start}
                        Else {"{0}-{1}" -f $_.Start,$_.End}}) -join ",")}
            },
            @{
                Name='NumbOfVMs'
                Expression={$_.Vm.count}
            },@{
                Name='NumofPorts'
                Expression={$_.PortKeys.count}
            },
            @{
                Name='AlarmActions'
                Expression={$_.AlarmActionsEnabled}
            },
            @{
                Name='DistributedSwitch'
                Expression={ Get-View $_.Config.DistributedVirtualSwitch `
                    -Property Name -verbose:$false | 
                    Select-Object -ExpandProperty Name}
            },
            @{
                Name='MoRef'
                Expression={ $_.MoRef}
            }
    }
}

modified code

Function Get-DistributedSwitchPortGroup
{
    <#
    .SYNOPSIS
        Get Distributed Virtual Port Groups (DVPG) by name or vDS.
    .DESCRIPTION
        Get Distributed Virtual Port Groups (DVPG) by name or vDS.
    .PARAMETER Name
        Name of the DVPG to retrieve supports wildcards.
    .PARAMETER DistributedSwitch
        Name of vDS to retrive the DVPG for.
    .EXAMPLE
        Get-DistributedSwitchPortGroup -Name PG02
    .EXAMPLE
        Get-DistributedSwitchPortGroup -DistributedSwitch vDS01
    #>
    [CmdletBinding()]
    param(
        [Parameter(Mandatory=$false
        ,   ValueFromPipelineByPropertyName=$true
        ,   ValueFromPipeline=$true)]
        [String]
        $NAME
    ,   [Parameter(Mandatory=$false
        ,   ValueFromPipelineByPropertyName=$true)]
        [String]
        $DistributedSwitch
    )
    Begin
    {
        $extraparams=@{}
        $extraparams["Property"] = @(
             'Name'
        ,    'Config.Description'
        ,    'Config.Type'
        ,    'Config.DefaultPortConfig'
        ,    'Config.DistributedVirtualSwitch'
        ,    'VM'
        ,    'PortKeys'
        ,    'AlarmActionsEnabled'
        )
        IF ($DistributedSwitch)
        {
            $vDSMoRef = Get-view -Property Name `
                -ViewType "VmwareDistributedVirtualSwitch" `
                -filter @{'Name'=$DistributedSwitch}`
                -verbose:$false | 
                Select-Object -ExpandProperty MoRef|Select-Object -ExpandProperty Value
            If ($Name)
            {
                $extraparams["filter"] = @{
                    'Name'=$Name
                    'Config.DistributedVirtualSwitch'="$($vDSMoRef)"

                }
            }
            Else
            {
                $extraparams["filter"] = @{
                    'Config.DistributedVirtualSwitch'="$($vDSMoRef)"
                    }
            }
        }
        If ($Name)
        {
            $extraparams["filter"] = @{'Name'=$Name}
        }
    }
    Process
    {
        get-view -ViewType  "DistributedVirtualPortgroup" -verbose:$false @extraparams |
            Select-Object @{
                Name='Name'
                Expression={$_.Name}
            },
            @{
                Name='Description'
                Expression={$_.Config.Description}
            },
            @{
                Name='PortBinding'
                Expression={$_.Config.Type}
            },
            @{
                Name='VLANID'
                Expression={(($_.Config.DefaultPortConfig.Vlan.VlanId|%{
                        if ($_ -match "\d+") {$_}
                        elseIf ($_.Start -eq $_.End) {$_.Start}
                        Else {"{0}-{1}" -f $_.Start,$_.End}}) -join ",")}
            },
            @{
                Name='NumbOfVMs'
                Expression={$_.Vm.count}
            },@{
                Name='NumofPorts'
                Expression={$_.PortKeys.count}
            },
            @{
                Name='AlarmActions'
                Expression={$_.AlarmActionsEnabled}
            },
            @{
                Name='DistributedSwitch'
                Expression={ Get-View $_.Config.DistributedVirtualSwitch `
                    -Property Name -verbose:$false | 
                    Select-Object -ExpandProperty Name}
            },
            @{
                Name='MoRef'
                Expression={ $_.MoRef}
            }
    }
}

2. Query portgroups with similar names

If you search for a portgroup like “VMOTION” the function also finds “VMOTION-DVUplinks-48″. With the patch the function only finds the exact name.

original code

Function Get-DistributedSwitchPortGroup
{
    <#
    .SYNOPSIS
        Get Distributed Virtual Port Groups (DVPG) by name or vDS.
    .DESCRIPTION
        Get Distributed Virtual Port Groups (DVPG) by name or vDS.
    .PARAMETER Name
        Name of the DVPG to retrieve supports wildcards.
    .PARAMETER DistributedSwitch
        Name of vDS to retrive the DVPG for.
    .EXAMPLE
        Get-DistributedSwitchPortGroup -Name PG02
    .EXAMPLE
        Get-DistributedSwitchPortGroup -DistributedSwitch vDS01
    #>
    [CmdletBinding()]
    param(
        [Parameter(Mandatory=$false
        ,   ValueFromPipelineByPropertyName=$true
        ,   ValueFromPipeline=$true)]
        [String]
        $NAME
    ,   [Parameter(Mandatory=$false
        ,   ValueFromPipelineByPropertyName=$true)]
        [String]
        $DistributedSwitch
    )
    Begin
    {
        $extraparams=@{}
        $extraparams["Property"] = @(
             'Name'
        ,    'Config.Description'
        ,    'Config.Type'
        ,    'Config.DefaultPortConfig'
        ,    'Config.DistributedVirtualSwitch'
        ,    'VM'
        ,    'PortKeys'
        ,    'AlarmActionsEnabled'
        )
        IF ($DistributedSwitch)
        {
            $vDSMoRef = Get-view -Property Name `
                -ViewType "VmwareDistributedVirtualSwitch" `
                -filter @{'Name'=$DistributedSwitch}`
                -verbose:$false | 
                Select-Object -ExpandProperty MoRef|Select-Object -ExpandProperty Value
            If ($Name)
            {
                $extraparams["filter"] = @{
                    'Name'=$Name
                    'Config.DistributedVirtualSwitch'="$($vDSMoRef)"

                }
            }
            Else
            {
                $extraparams["filter"] = @{
                    'Config.DistributedVirtualSwitch'="$($vDSMoRef)"
                    }
            }
        }
        If ($Name)
        {
            $extraparams["filter"] = @{'Name'=$Name}
        }
    }
    Process
    {
        get-view -ViewType  "DistributedVirtualPortgroup" -verbose:$false @extraparams |
            Select-Object @{
                Name='Name'
                Expression={$_.Name}
            },
            @{
                Name='Description'
                Expression={$_.Config.Description}
            },
            @{
                Name='PortBinding'
                Expression={$_.Config.Type}
            },
            @{
                Name='VLANID'
                Expression={(($_.Config.DefaultPortConfig.Vlan.VlanId|%{
                        if ($_ -match "\d+") {$_}
                        elseIf ($_.Start -eq $_.End) {$_.Start}
                        Else {"{0}-{1}" -f $_.Start,$_.End}}) -join ",")}
            },
            @{
                Name='NumbOfVMs'
                Expression={$_.Vm.count}
            },@{
                Name='NumofPorts'
                Expression={$_.PortKeys.count}
            },
            @{
                Name='AlarmActions'
                Expression={$_.AlarmActionsEnabled}
            },
            @{
                Name='DistributedSwitch'
                Expression={ Get-View $_.Config.DistributedVirtualSwitch `
                    -Property Name -verbose:$false | 
                    Select-Object -ExpandProperty Name}
            },
            @{
                Name='MoRef'
                Expression={ $_.MoRef}
            }
    }
}

modified code

Function Get-DistributedSwitchPortGroup
{
    <#
    .SYNOPSIS
        Get Distributed Virtual Port Groups (DVPG) by name or vDS.
    .DESCRIPTION
        Get Distributed Virtual Port Groups (DVPG) by name or vDS.
    .PARAMETER Name
        Name of the DVPG to retrieve supports wildcards.
    .PARAMETER DistributedSwitch
        Name of vDS to retrive the DVPG for.
    .EXAMPLE
        Get-DistributedSwitchPortGroup -Name PG02
    .EXAMPLE
        Get-DistributedSwitchPortGroup -DistributedSwitch vDS01
    #>
    [CmdletBinding()]
    param(
        [Parameter(Mandatory=$false
        ,   ValueFromPipelineByPropertyName=$true
        ,   ValueFromPipeline=$true)]
        [String]
        $NAME
    ,   [Parameter(Mandatory=$false
        ,   ValueFromPipelineByPropertyName=$true)]
        [String]
        $DistributedSwitch
    )
    Begin
    {
        $extraparams=@{}
        $extraparams["Property"] = @(
             'Name'
        ,    'Config.Description'
        ,    'Config.Type'
        ,    'Config.DefaultPortConfig'
        ,    'Config.DistributedVirtualSwitch'
        ,    'VM'
        ,    'PortKeys'
        ,    'AlarmActionsEnabled'
        )
        IF ($DistributedSwitch)
        {
            $vDSMoRef = Get-view -Property Name `
                -ViewType "VmwareDistributedVirtualSwitch" `
                -filter @{'Name'=$DistributedSwitch}`
                -verbose:$false | 
                Select-Object -ExpandProperty MoRef|Select-Object -ExpandProperty Value
            If ($Name)
            {
                $extraparams["filter"] = @{
                    'Name'="^$($Name)$"
                    'Config.DistributedVirtualSwitch'="$($vDSMoRef)"

                }
            }
            Else
            {
                $extraparams["filter"] = @{
                    'Config.DistributedVirtualSwitch'="$($vDSMoRef)"
                    }
            }
        }
        If ($Name)
        {
            $extraparams["filter"] = @{'Name'="^$($Name)$"}
        }
    }
    Process
    {
        get-view -ViewType  "DistributedVirtualPortgroup" -verbose:$false @extraparams |
            Select-Object @{
                Name='Name'
                Expression={$_.Name}
            },
            @{
                Name='Description'
                Expression={$_.Config.Description}
            },
            @{
                Name='PortBinding'
                Expression={$_.Config.Type}
            },
            @{
                Name='VLANID'
                Expression={(($_.Config.DefaultPortConfig.Vlan.VlanId|%{
                        if ($_ -match "\d+") {$_}
                        elseIf ($_.Start -eq $_.End) {$_.Start}
                        Else {"{0}-{1}" -f $_.Start,$_.End}}) -join ",")}
            },
            @{
                Name='NumbOfVMs'
                Expression={$_.Vm.count}
            },@{
                Name='NumofPorts'
                Expression={$_.PortKeys.count}
            },
            @{
                Name='AlarmActions'
                Expression={$_.AlarmActionsEnabled}
            },
            @{
                Name='DistributedSwitch'
                Expression={ Get-View $_.Config.DistributedVirtualSwitch `
                    -Property Name -verbose:$false | 
                    Select-Object -ExpandProperty Name}
            },
            @{
                Name='MoRef'
                Expression={ $_.MoRef}
            }
    }
}

unattended XenServer Installation without PXE

2012-08-08T00:00:00+00:00

As some of you might know: I’m a big fan of unattended installations – they are reproducible, portable and document themselves in a machine readable format. Normally these installations are triggered by some software distribution system (ESD). For OS installations (e.g. XenServer) you boot from network using with DHCP with PXE options to transfer the boot image via TFTP and the ESD jumps in.

There are good reasons why network installations might not work for you: ESD licensing, other PXE services in subnet, DHCP/PXE not implemented… etc.

If you want to create a full documented unattended installation of XenServer without network boot here is the way to go: instead of booting the installer from network you are booting from CD/ISO file. The installation binaries and answerfile (defining your installation parameters) is located on a network location (http,ftp,nfs).

Set up the installation repository

I use a Debian Linux with Apache webserver as installation repository. In the first step copy the whole content of the installation media to the repository.

# mount the Xenserver install media
mount -o loop XenServer-6.0.201-install-cd.iso /mnt/tmp
# create the repository within apache default website
mkdir /var/www/xs602

cp -R /mnt/tmp/ /var/www/xs602/
# do not forget to set permissions
chown -R www-data:www-data /var/www/xs602/

Create an installation answer file

Next you need an answer file for the XenServer installer. Here is mine:



extlinux
sda
de
xenserver01
putyourcleartextpasswordhere
http://192.168.1.4/xs602/

192.168.1.14
255.255.255.0
192.168.1.1

192.168.1.4
Europe/Berlin
ntp
10.1.1.10

run the Installer

As last step fire up the target system and boot CD/ISO File. At the bootscreen enter “menu.c32″ and use [TAB] to edit the installation string. Add “answerfile=http://yourwebserver/answerfile.xml install” after “console=ttyp0″.

That’s all, get yourself a cup of coffee and relax while your XenServer gets installed.

Even through this is not(!) the 100% automated installation (you need to boot from a CD/ISO file and type in a small string) it helps you to have a full documented and reproducible installation. Best thing is you do not need any kind of infrastructure – just a webserver for the installation repository.

Remove a installed vib Package from a ESX5i host

2012-07-26T00:00:00+00:00

With the vCenter Update Manager you cannot remove installed packages. For ESX5i one option is to use a few commandline steps in the local ESX shell.

Logon to ESX Host with SSH
query the package name:

esxcli software vib list |grep -i emc

remove the package

esxcli software vib remove -n EMCNasPlugin

put the host in maintenance mode

vim-cmd hostsvc/maintenance_mode_enter

reboot the host

reboot

run Scan for Updates on this Host in Vcenter to update the Status inside the Update Manager
Disable the Maintenance Mode for that Host

Monitoring Splunk Forwarder Client versions using Nagios

2015-02-19T00:00:00+00:00

In today’s article about Splunk monitoring we want to monitor the version of the Splunk components which are connecting to Forwarder Management. The former name of Forwarder Management was Deployment Server – which I personally prefer more as it not only configures and manages your Forwarders, but all Splunk components including Indexers and Search Heads.

Unfortunately the Forwarder Mangement WebGUI only displays the OS platform of the client but not the Software version.

With the Nagios plugin we want ensure that all your clients (Indexers, Search Heads, Forwarders) are running at least the defined Splunk version. Otherwise an error will be generated.

You need to have Forwarder Management implemented for this check. On the client you just need to point it to one Forwarder Management server, which can be any Splunk server in your environment. You can set the Forwarder to the Forwarder Management server with the command

$SPLUNK_HOME$/bin/splunk set deploy-poll bd20.bwlab.loc:8089

and check using

$SPLUNK_HOME%/bin/splunk list deploy-poll

The Nagios plugin queries Forwarder Management for the client list and compares every client against a minimum build level you can define. The plugin is a Powershell script communicating with the REST API of Splunk. For that reason the script has to be executed from a Windows device. That does not meanthe Splunk instance running the Forwarder Management role has to be installed on the Windows machine. If yourun Splunk on Linux or Mac you just need a Windows machine in your environment which executes the script against the non-Windows Splunk instance.

You can download the plugin from here. It uses some functions from the Splunk Powershell resource Kit which is also included in the download.

Setup monitoring using nsclient++ on Windows

In this example the Forwarder Management server runs on the same machine like the Nagios client nsclient++. If you need to do an indirect query, because your Splunk server is running on a non-Windows machine simply adjust the IP in the Nagios service definition.Download and extract the files to C:\Program Files\NSClient++\scripts\splunk

Download and extract the files to C:\Program Files\NSClient++\scripts\splunk

Adjust your “C:\Program Files\NSClient++\nsclient.ini” and add the external script

[/settings/external scripts/scripts]
splunkfwmanagementversion = cmd /c echo scripts\\splunk\\check-deploymentclientsversion.ps1 -servername $ARG1$ -username $ARG2$ -password $ARG3$ -minbuild $ARG4$; exit($lastexitcode) | powershell.exe -command –

On the Nagios server: add a service to your host definition

use generic-service ; Name of service template to use
host_name bd20.bwlab.loc
service_description Splunk FW Management Clients Version
check_command nt_nrpe_splunkfwmanagementversion!localhost!admin!mypassword!220630
}

After reloading the Nagios config you should verify the status of the check. It should look like this if everything is running smoothly.

Parameters

You can also run the PowerShell script manually for testing. The script accepts multiple parameters

-servername
Servername or IP address of the Deployment Server/Forwarder Management

 -port
Port of splunkd – default 8089

-protocol
Protocol to use to communicate with splunkd – default: https

-timeout
Connection timeout to splunkd in milliseconds – default 5000

-username
Username to use to login to splunkd

-password
Password to use with splunkd

-minbuild
Build version of client to check. has to be passed as an integer value. if client runs on a lower build version a critical message will be is generated
example build numbers
version 4.3.3 = build 128297
version 6.0.2 = build 196940
version 6.1.1 = build 207789
version 6.1.3 = build 220630
version 6.1.4 = build 233537
version 6.1.5 = build 239630
version 6.2.0 = build 237341
version 6.2.1 = build 245427

Monitoring Splunk Forwarder Management Clients using Nagios

2015-02-18T00:00:00+00:00

There are a few things you want to monitor in a production Splunk environment. I’m planning to release a few articles about basic Splunk monitoring. I’m checking our environment using Nagios, but the scripts should also work without any major adjustments for other monitoring solutions like Microsoft SCOM, Zappix or Openview as they all work in the same way.

If you use Forwarder Management (also known as Deployment Server) to configure your infrastructure, you really want to make sure your Clients/Forwarders are up-and-running. In the Splunk Webpage you have a page for this within Settings->Forwarder Management:

To ensure that a client is pointed to the Deploymentserver check the configuration in $SPLUNK_HOME$/etc/system/local/deploymentclient.conf or run the “splunk show deploy-poll” command. To set the Forwarder Management Server use “splunk set deploy-poll SERVER:8089″.

By default a client will call back Forwarder Management Server every 60 seconds. If communication fails the output looks like this:

The phone home interval can be configured in $SPLUNK_HOME$/etc/system/local/deploymentclient.conf using the phoneHomeinvervalinSecs Parameter.

The Nagios plugin asks the Forwarder Management if every client has phoned back correctly. The plugin is a Powershell script communicating with the REST API of Splunk. For that reason the script has to be executed from a Windows device. That does not mean the Splunk instance running the Forwarder Management role has to be installed on the Windows machine. If you run Splunk on Linux or Mac you just need a Windows machine in your environment which executes the script against the non-Windows Splunk instance.

You can download the plugin from here. It uses some functions from the Splunk Powershell resource Kit which is also included in the download.

Setup monitoring using nsclient++ on Windows

Download and extract the files to C:\Program Files\NSClient++\scripts\splunk

Adjust your “C:\Program Files\NSClient++\nsclient.ini” and add the external script

[/settings/external scripts/scripts]
splunkfwmanagement = cmd /c echo scripts\\splunk\\check-deploymentclients.ps1 -servername $ARG1$ -username $ARG1$ -password $ARG2$ -warn $ARG3$ -critical $ARG4$; exit($lastexitcode) | powershell.exe -command –

On the Nagios server: create a new command using NRPE

# ‘nt_nrpe_splunkfwmanagement’ command definition
define command{
command_name            nt_nrpe_splunkfwmanagement
command_line            /usr/lib/nagios/plugins/check_nrpe -t 30 -H  $HOSTADDRESS$ -p 5666 -c splunkfwmanagement -a $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$
}

On the Nagios server: add a service to your host definition

define service{
use                             generic-service         ; Name of service template to use
host_name                       bd20.bwlab.loc
service_description             Splunk FW Management Client Connectivity
check_command                   nt_nrpe_splunkfwmanagement!localhost!admin!mypassword!5!30
}

After reloading the Nagios config you should verify the status of the check. It should look like this if everything is running smoothly.

In case of an error it will look like this:

Parameter and Troubleshooting

You can also run the PowerShell script manually for testing. The script accepts multiple parameters:

-servername
Servername or IP address of the Deployment Server/Forwarder Management

-port
Port of splunkd – default 8089

-protocol
Protocol to use to communicate with splunkd – default: https

-timeout
Connection timeout to splunkd in milliseconds -  default 5000

-username
Username to use to login to splunkd

-password
Password to use with splunkd

-warn
time in seconds (default 5) which a client is allowed to overdue before a warning is generated, depends on configured phoneHomeIntervalInSecs (default 60) in client settings

-critical
time in seconds (default 300) which a client is allowed to overdue before a critical is generated, depends on configured phoneHomeIntervalInSecs (default 60) in client settings

Running XenMobile MDX Toolkit on Windows and virtualized Macs

2015-02-18T00:00:00+00:00

It’s a well-known rumor in the Citrix XenMobile world, that if you want to start with XenMobile and Appcontroller, you have to buy a Mac. The reason for that is the Citrix MDX Toolkit only runs on Mac OS X on physical hardware. I decided give it a try on a virtualized Mac and Windows.

Wrapping Applications on a virtualized Mac

For wrapping/creating .mdx files on a Mac you’ll need to install JDK1.7, XCode, Xcode Command Line Tools and Citrix MDX Toolkit.

This is how MDX Toolkit looks like when running it in a VM:

compared to a non-virtual mac:

It looks quite unusable. But hey, MDX Toolkit is just a GUI application triggering a command line. For IOS it is triggering the CGAppCLPrepTool application with some parameters. Let’s check if the command line works:

./CGAppCLPrepTool Wrap -Cert “iPhone Distribution: Joe Public (ABCDEF1234)” -Profile “citrix_distribution.mobileprovision” -in “myapplication.ipa” -out “myapplication.mdx” -appdesc “doing this stuff from commandline”

As a result you’ll get the expected .mdx file – Upload this file to Appcontroller and it will work.

Wrapping Android Applications on Windows

For wrapping applications for Android you’ll need to have JRE, Android SDK and MDX Toolkit installed. Let’s check what happens if you copy the whole /Applications/Citrix/MDX Toolkit Folder to a Windows machine.

First we need to create a Keystore:

“c:\Program Files\Java\jdk1.7.0_67\bin\keytool.exe” -genkey -dname “cn=Android, o=Android, c=US” -keystore C:\temp\demo.keystore -storepass android -alias wrapkey -keypass android -keysize 1024 -sigalg SHA1withRSA -keyalg RSA

Next step is to wrap the application – I’m wrapping the Citrix Worx Mail app in this example:

“c:\Program Files\Java\jdk1.7.0_67\bin\java.exe” -jar C:\temp\MDXToolkit\ManagedAppUtility.jar wrap -in c:\temp\apps\CitrixEmail9.0-release.apk -out c:\temp\apps\CitrixEmail9.0-release.mdx -keystore C:\temp\my.keystore -storepass android -keyalias wrapkey -keypass android

No errors – looks fine, worx|ks fine.. and means you can wrap Android applications on Windows. You just need somebody who will install the MDX Toolkit one time and provide you the extracted installation files.