Overview of splunk versions and default tsidxWritingLevel:
Splunk Version | available tsidxWritingLevel | default tsidxWritingLevel |
---|---|---|
v8.2.0 | 1,2,3,4 | 2 |
v8.1.0 | 1,2,3,4 | 1 |
v8.0.0 | 1,2,3 | 1 |
v7.3.0 | 1,2,3 | 1 |
v7.2.0 | 1,2 | 1 |
v7.1.0 | no setting, 1 assumed | 1 |
So if you want to benefit from the latest storage and performance improvements from Splunk Enterprise you have to increase this setting. As I haven’t found any valid numbers besides “up to 40% reduced storage”* what an increase of the parameter will mean in real world, i decided to test it out on my own.
When changing this setting, only new buckets will be created with the higher level. Old data which was produced with an older setting will not be converted.
I created a test where I startup a single Instance on AWS, feed it with some logs, capture the time taken and the size of the tsidx files and repeat for every tsidxWritingLevel 3 times to validate the results.
test steps:
run number | tsidxWritingLevel | time taken ingest | Bucket sizeOnDisk (MB) | load avg | 1min load avg |
---|---|---|---|---|---|
1 | 1 | 95s | 519.89 | 3.95, 1.68, 1.56 | 4.05 |
2 | 1 | 100s | 519.25 | 4.23, 2.57, 1.9 | 4.05 |
3 | 1 | 95s | 525.32 | 3.97, 2.98, 2.14 | 4.05 |
1 | 2 | 100s | 475.30 | 4.19, 3.29, 2.36 | 4.47 |
2 | 2 | 100s | 475.89 | 4.61, 3.68, 2.63 | 4.47 |
3 | 2 | 95s | 472.64 | 4.62, 3.86, 2.83 | 4.47 |
1 | 3 | 105s | 461.16 | 4.17, 3.83, 2.96 | 4.29 |
2 | 3 | 100s | 452.96 | 3.9, 3.71, 3.03 | 4.29 |
3 | 3 | 95s | 450.67 | 4.79, 3.97, 3.21 | 4.29 |
1 | 4 | 105s | 403.32 | 4.35, 3.92, 3.29 | 4.07 |
2 | 4 | 100s | 413.07 | 3.85, 3.82, 3.34 | 4.07 |
3 | 4 | 100s | 407.33 | 4.01, 3.83, 3.4 | 4.07 |
average tsidx storage needed:
More or less as expected, the hightest tsidxWritingLevel=4 had most optimizations regarding storage. For this test case ~20% less disk space is needed for the index files. Notice that these really only apply for the given dataset, your results may vary.
As we see the ingest time seems slightly but not noticeable higher. The load avg displays the 1min, 5min, 15min for documentation purposes. It is expectable that 5 and 15 min load is rising during the test.. in general the avg 1 min load is quite comparable.
For metrics we created a sample csv of 15mio. events containing 15 metrics with different values and 3 static dimensions. The resulting rawdata is 810MB.
run number | tsidxWritingLevel | time taken ingest | Bucket sizeOnDisk (MB) | load avg |
---|---|---|---|---|
1 | 1 | 95s | 205.51 | 2.57, 1.34, 0.66 |
2 | 1 | 95s | 204.38 | 2.96, 1.89, 0.95 |
1 | 2 | 95s | 203.97 | 3.61, 2.4, 1.26 |
2 | 2 | 95s | 205.30 | 3.13, 2.55, 1.47 |
1 | 3 | 95s | 204.72 | 2.82, 2.57, 1.62 |
2 | 3 | 95s | 204.36 | 2.93, 2.69, 1.8 |
1 | 4 | 95s | 204.18 | 2.75, 2.61, 1.89 |
2 | 4 | 95s | 204.53 | 2.82, 2.58, 1.97 |
As we see there is almost no difference in the resulting bucketsize metrics. All results vary less than 1%. All runs have the same ingest time.
We learned that tsidxWritingLevel of metric indexes have no impact on storage size so far.
Here you find a link to the git repo where the tests are documented. You can adjust the config.yml file to create your own tests with your own data.
There is a bug affecting version v8.1.1, v8.1.2, v8.1.3 (fixed in v8.1.4) and v8.2.0 (fixed in 8.2.1) documented as SPL-197930. There indexers have huge memory spikes and may crash when tsidxwritinglevel = 4 is set: see link.
]]>Even if “| delete” is not a very common command, it’s used from time to time to clean up unwanted events. So what happens if you delete data by mistake? How to recover those events when the docs say it’s not possible?
When we look at the documentation it’s stated that “Removing data is irreversible. If you want to get your data back after the data is deleted, you must re-index the applicable data sources." and “Using the delete command marks all of the events returned by the search as deleted." Re-index of data is quite often impossible when data is coming from transient data sources like MQTT or REST interfaces.
Well, what if we find these markers and remove them? Normally this should bring us in the situation where data is searchable again, right?!
All event data in Splunk is stored in indexes. Every index consists of buckets, which are folders with a predefined naming convention. Let’s have a look at those buckets and compare a bucket with deleted and non-deleted data.
Let’s search for some data. Please note that the internal field _bkt is the bucket where an event is stored.
splunk search "index=test sourcetype=testjson earliest=0 | stats count values(_bkt) as bucket values(index) as index | table index count bucket"
index count bucket
----- ----- -------------------------------------------
test 200 test~5~8318D59B-46EF-45B4-ACFA-AB89AAF73434```
Here we find 200 events of sourcetype testjson in the index “test” in the bucket test~5~8318D59B-46EF-45B4-ACFA-AB89AAF73434
. Let’s jump into the filesystem structure and find this bucket/folder.
db_1623401960_1623401955_5
├── 1623401960-1623401955-14536582616925630097.tsidx
├── Hosts.data
├── SourceTypes.data
├── Sources.data
├── bloomfilter
├── bucket_info.csv
├── optimize.result
└── rawdata
├── 0
└── slicesv2.dat
total 56
-rw------- 1 andreas staff 4640 11 Jun 11:03 1623401960-1623401955-14536582616925630097.tsidx
-rw------- 1 andreas staff 103 11 Jun 11:03 Hosts.data
-rw------- 1 andreas staff 103 11 Jun 11:03 SourceTypes.data
-rw------- 1 andreas staff 94 11 Jun 11:03 Sources.data
-rw------- 1 andreas staff 49 11 Jun 11:03 bloomfilter
-rw------- 1 andreas staff 67 11 Jun 10:59 bucket_info.csv
-rw------- 1 andreas staff 0 11 Jun 11:03 optimize.result
drwx------ 4 andreas staff 128 11 Jun 11:03 rawdata
This is how a bucket looks like: The rawdata subdirectory contains the original events in a compressed format. The *.tsidx files are the index over those rawdata events. *.data files are holding meta information about the rawdata source, sourcetype and hosts fields.
We run all commands from the cli, as this might be easier to read in the article. Now let’s delete some data using the “| delete” command.
splunk search "index=test sourcetype=testjson earliest=0 | delete"
INFO: 200 events successfully deleted
splunk_server index deleted errors
-------------------------- ------- ------- ------
andreass-MacBook-Pro.local __ALL__ 200 0
and search for the data again:
splunk search "index=test sourcetype=testjson earliest=0 | stats count"
count
-----
0
It seems as if the data is now deleted, or rather “marked as delete” successfully.
Now let’s run tree and ls again:
db_1623401960_1623401955_5
├── 1623401960-1623401955-14536582616925630097.tsidx
├── Hosts.data
├── SourceTypes.data
├── Sources.data
├── bloomfilter
├── bucket_info.csv
├── optimize.result
└── rawdata
├── 0
├── deletes
│ └── 8c1659e22188a580759cbf34a6e26308.csv.gz
└── slicesv2.dat
total 56
-rw------- 1 andreas staff 4640 27 Sep 18:07 1623401960-1623401955-14536582616925630097.tsidx
-rw------- 1 andreas staff 103 11 Jun 11:03 Hosts.data
-rw------- 1 andreas staff 103 11 Jun 11:03 SourceTypes.data
-rw------- 1 andreas staff 94 11 Jun 11:03 Sources.data
-rw------- 1 andreas staff 49 11 Jun 11:03 bloomfilter
-rw------- 1 andreas staff 67 11 Jun 10:59 bucket_info.csv
-rw------- 1 andreas staff 0 11 Jun 11:03 optimize.result
drwx------ 5 andreas staff 160 27 Sep 18:07 rawdata
Well, let’s have a closer look at the buckets 1623401960-1623401955-14536582616925630097.tsidx file. From the timestamp we see that this file has been modified while we deleted the data. Let’s try to rebuild this .tsidx file. So we stop Splunk and run rebuild using the “splunk fsck repair” command.
splunk stop
rm -f /Users/andreas/splunk/var/lib/splunk/test/db/db_1623401960_1623401955_5/1623401960-1623401955-14536582616925630097.tsidx
splunk fsck repair --one-bucket --bucket-path=/Users/andreas/splunk/var/lib/splunk/test/db/db_1623401960_1623401955_5
splunk start
and search again:
splunk search "index=test sourcetype=testjson earliest=0 | stats count"
count
-----
0
No luck.. data still not searchable. Looks like we have overseen something.
Let’s have a closer look at the bucket.. see the “deletes” subdirectory from rawdata? This directory and it’s content was also created when we deleted the data. Now, let’s stop Splunk, remove the “deletes” subdirectory and repair the bucket again.
splunk stop
rm -rf ~/splunk/var/lib/splunk/test/db/db_1623401960_1623401955_5/rawdata/deletes
splunk fsck repair --one-bucket --bucket-path=/Users/andreas/splunk/var/lib/splunk/test/db/db_1623401960_1623401955_5
splunk start
Voila, the data is available again.
splunk search "index=test sourcetype=testjson earliest=0 | stats count"
count
-----
200
When running the “| delete” command, Splunk is actively changing the .tsidx index files to ensure searching of the deleted data is not possible anymore.
Besides that, the subdirectory “deletes” with markers to the deleted events in rawdata is created. Those markers come into play when the bucket is recreated from rawdata. Those recreation will take place if you thaw a frozen bucket from an archive or make replicated buckets searchable using index clustering. For that reason the first recovery attempt failed, as the “deletes” directory just marked the events in the index “deleted” again.
This mechanism makes Splunk Enterprise consuming more storage when you are deleting data.
If you would just restore the .tsidx file from your backup the events are immediately searchable again - even without a splunk restart. BUT if you are not cleaning up the “deletes” directory, events will be marked as deleted at the next index rebuild. So be aware when recovering buckets from backup: always recovery the entire bucket and ensure the “deletes” subdirectory is also deleted!
Happy backup & restore!
]]>brew install hugo
hugo new site --source ./ test.batchworks.de
mkdir -p test.batchworks.de/themes/hugo-geekdoc/
curl -sL https://github.com/thegeeklab/hugo-geekdoc/releases/download/v0.19.1/hugo-geekdoc.tar.gz | tar -xz -C test.batchworks.de/themes/hugo-geekdoc/
baseurl = "http://test.batchworks.de/"
languageCode = "en-us"
title = "test.batchworks.de"
theme = "hugo-geekdoc"
[permalinks]
posts = "/:title/"
page = "/:slug/"
## geekdoc theme settings
# Required to get well formatted code blocks
pygmentsUseClasses = true
pygmentsCodeFences = true
disablePathToLower = true
enableGitInfo = false
[markup]
[markup.goldmark.renderer]
unsafe = true
[markup.tableOfContents]
startLevel = 1
endLevel = 9
[taxonomies]
author = "authors"
tag = "tags"
## geekdoc theme settings end
# Theme variables
#
[params]
# Site author
author = "Birk Bohne"
geekdocBreadcrumb = false
# Format dates with Go's time formatting
date_format = "Mon Jan 02, 2006"
---
title: Welcome to the test site
geekdocDescription: This is the start page.
weight: 10
---
# The start page
Welcome to the start page
hugo server --source test.batchworks.de/ --baseUrl http://localhost
Start building sites …
hugo v0.88.1+extended darwin/amd64 BuildDate=unknown
| EN
-------------------+------
Pages | 7
Paginator pages | 0
Non-page files | 0
Static files | 105
Processed images | 0
Aliases | 2
Sitemaps | 1
Cleaned | 0
Built in 31 ms
---
title: Sub content
geekdocDescription: focus on other topics
weight: 10
---
the sub page
---
title: Mermaid charts
geekdocDescription: render charts with mermaid markup code
weight: 10
---
{{< mermaid class="text-center">}}
flowchart TD
A[Start] --> B{Is it?};
B -->|Yes| C[OK];
C --> D[Rethink];
D --> B;
B ---->|No| E[End];
{{< /mermaid >}}
{{ $arg0 := .Get 0 }}
{{ $data := index .Site.Data.content $arg0 }}
{{ $.Scratch.Set "count" 0 }}
<table>
<thead>
<tr>
<th>Name</th>
<th>Function</th>
</tr>
</thead>
<tbody>
{{ range $datacontent := $data }}
<tr>
<td>{{ $datacontent.name }}</td>
<td>{{ $datacontent.function }}</td>
</tr>
{{ end }}
</tbody>
</table>
- name: "json"
function: "read JSON"
- name: "csv"
function: "read CSV"
---
title: Data tables
geekdocDescription: render tables
weight: 20
---
## Data sources
{{< data_table "data" >}}
mkdir -p tmp/ www/
hugo --source ${PWD}/test.batchworks.de/ --cacheDir ${PWD}/tmp --destination ${PWD}/www --baseURL http://test.batchworks.de
Further information could be found on Splunkbase (https://splunkbase.splunk.com/app/4005/) and the github repo located at: https://github.com/schose/collectd2.
]]>Development takes place in the git repo hosted at https://git.batchworks.de/andreas/TA-routeros . You can download it from there or from https://splunkbase.splunk.com/app/3845/.
Data is extracted for the Splunk CIM data models network traffic, name resolution (DNS), DHCP and authentication.
]]>As I couldn’t imagine that something with the abbreviation “XML” in it could be something like “small” and “fast” I decided to do a test.
Interesting enough there is a blog article found at http://blogs.splunk.com/2014/11/04/splunk-6-2-feature-overview-xml-event-logs/ also stating that you would have an data reduction.Interesting enough there is a blog article found at http://blogs.splunk.com/2014/11/04/splunk-6-2-feature-overview-xml-event-logs/ also stating that you would have data reduction.
You start to collect XML Events with adding renderXml = 1 to the input stanza. When doing so the suppress_text = 1 is automatically set. Of course you could also omit the Eventlog message for your non-XML input and achieve the same volume reduction. Here I will keep the Eventlog message for the XML and non-XML scenario to ensure not comparing apples and oranges.
Index performance
The same dataset was indexed by the same forwarder on the same hardware.. let’s determine the time needed for indexing:
index=noxml OR index=xml | stats count earliest(_indextime) as ite latest(_indextime) as itl by index | eval timediff = itl-ite | convert ctime(ite) ctime(itl)
Indexing the XML took 195 seconds vs. 153 seconds, 27,5% longer.
Size
Determine the indexsize:
| dbinsprect index=noxml OR index=XML | table index SizeOnDiskMB
XML needed 17,7% more storage.
Search performance
Running a noncomplex search, I can’t remember if I was more surprised that the XML search was more than 10X slower or that it showed a different result count. I repeated the search 3 times to ensure the results are accurate.
index=noxml | stats count by EventCode – (fast mode enabled)
This search has completed and has returned 223 results by scanning 56,807 events in 2.923 seconds. This search has completed and has returned 223 results by scanning 56,807 events in 2.91 seconds. This search has completed and has returned 223 results by scanning 56,807 events in 2.888 seconds.
index=xml | stats count by EventCode – fast mode
This search has completed and has returned 106 results by scanning 56,807 events in 33.822 seconds. This search has completed and has returned 106 results by scanning 56,807 events in 33.593 seconds. This search has completed and has returned 106 results by scanning 56,807 events in 33.309 seconds.
All time ran into command.search.kv
I found a lot of events where the EventID is not extracted correctly from the XML..
index=xml OR index=noxml sourcetype=*application* RecordNumber=9172 | table EventCode sourcetype index _raw
All tests was done on “latest an greatest” Splunk TA Windows v4.8.3 runing on Splunk Enterprise v6.5
Summary
Do never ever use XML rendering because of performance or expected data volume reduction. For now the only valid reason seems to be to overcome language issues..
]]>To index events based on a RDBMS there is Splunk’s well-known DB Connect app (https://splunkbase.splunk.com/app/2686/). Unfortunately the DB Connect support matrix doesn’t mention anything with H2 database – so I decided to test it out.
H2 Database
I’ve never run into H2 before, it really seems to be a niche product. The installation is downloading and extracting the .zip file – awesome! It’s the size of 1.5 MB and has a great feature set like In-Memory Mode, Built-in Clustering / Replication…
http://www.h2database.com/html/features.html#comparison
By default H2 has 2 connection modes:
DB Connect setup
As always you need to install Java 8 for DB Connect. Even if openjdk is working fine I always recommend to use Oracle Java for support reasons. Extract DB Connect to your $SPLUNK_HOME$/etc/apps path and run the setup wizard if you have a full Splunk installation.. otherwise you can edit app.conf and inputs.conf to enable it and set the JRE path correctly.
app.conf
[install]
is_configured = 1
inputs.conf
[rpcstart://default]
javahome = /usr/local/jre1.8.0_111
useSSL = 0
Full installation procedure is documented in splunk docs at http://docs.splunk.com/Documentation/DBX/2.4.0/DeployDBX/Checklist .
Next you need to download H2 database (http://www.h2database.com/h2-2016-10-31.zip), extract and copy the bin/h2-1.4.193.jar to $SPLUNK_HOME$/etc/apps/splunk_app_db_connect/bin/lib directory.
Next configure a custom db type by creating the config file $SPLUNK_HOME$/etc/apps/splunk_app_db_connect/local/db_connection_types.conf. This is not implemented in the DB Connect WebGUI.
db_connection_types.conf:
[h2tcp]
displayName = H2-tcp
serviceClass = com.splunk.dbx2.DefaultDBX2JDBC
jdbcUrlFormat = jdbc:h2:tcp://: /
jdbcDriverClass = org.h2.Driver
[h2local]
displayName = H2-local
serviceClass = com.splunk.dbx2.DefaultDBX2JDBC
jdbcUrlFormat = jdbc:h2:/
jdbcDriverClass = org.h2.Driver
The [h2tcp] stanza defines the connection for server mode, while [h2local] defines embedded/local mode. After doing so and restarting Splunk you’ll see two new driver entries in DB Connect – stating “unsupported”
Create credentials first, followed by a connection. Make sure to use TCP/9092 when connecting to a remote H2 instance. The remote instance has to be started using the –tcpAllowOthers parameter.
A new connection will be saved in db_connections.conf. This is an example:
[h2remote]
connection_type = h2tcp
database = /tmp/h2demo
host = 127.0.0.1
identity = sa
jdbcUrlFormat = jdbc:h2:tcp://: /
jdbcUseSSL = 0
port = 9092
When defining an input to pull events out of the db this is done like always in inputs.conf. Here is an example:
inputs.conf
[mi_input://h2remote-users]
connection = h2remote
enable_query_wrapping = 1
index = test_high
interval = 60
max_rows = 10000
mode = tail
output_timestamp_format = yyyy-MM-dd HH:mm:ss
query = SELECT * FROM INFORMATION_SCHEMA.USERS
sourcetype = dbx:h2
tail_rising_column_name = ID
ui_query_mode = advanced
tail_rising_column_checkpoint_value = 2
H2 restrictions
Other than you might expect it’s not possible to use two applications writing or reading in local/embedded mode. You’ll receive the message “org.h2.jdbc.JdbcSQLException: Database may be already in use: null. Possible solutions: close all other connection(s); use the server mode [90020-193]”. This is by design and can be solved as mentioned before by starting H2 with the –tcp for local-only connections or –tcpAllowOthers for all other connections.
]]>There are different situations when a bucket is rolled from hot to warm:
here’s an overview of indexes.conf parameters:
As you see there is no time but only a size value when Splunk rolls a bucket from hot to warm. There are situations where you manually want to roll a bucket from hot to warm. Here is a hot bucket in an index named “bwindex” with the ID 37.
You can force Splunk to roll the bucket with using this command: splunk _internal call /data/indexes/INDEXNAME/roll-hot-buckets -auth admin:password Where INDEXNAME the name of the index to roll.
You can also trigger the rolling on a remote indexer using curl: curl –k https://localhost:8089/services/data/indexes/bwindex/roll-hot-buckets -x POST -u admin:password After that we check again on the hot buckets and see that there is a new hot bucket with ID 38. The old bucket has been renamed to db_timestamp_timestamp_ID.
There might be situations in the real world where you want to roll the hot buckets manually. There was an index replication cluster where the search factor and replication factor wasn’t met. The error message was “Cannot fix search count as the bucket hasn’t rolled yet”. As this could take up to 90 days to complete (if the data volume is small enough) we want to force the indexer to roll the bucket.
There is a screenshot:
Splunk v6.4
There is a feature easy to overlook in Splunk6.4 which is named “Force roll specific hot buckets“. You can find the documentation here. There is a new REST endpoint /services/cluster/master/control/control/roll-hot-buckets – which makes your life easier. In older versions you need to determine the concrete indexer by matching the GUID from the bucket information (e.g.: _audit~2~1A3889D7-954B-4CE6-B071-01B438DE9865) and send the REST request to the cluster peer directly.
old method – pre v6.4(for every indexer and bucket):
| rest /services/cluster/master/peers splunk_server=local | table id label status last_heartbeat
curl –k https://clusterpeer:8089/services/data/indexes/INDEXNAME/roll-hot-buckets -x POST -u admin:password
Now you can force the cluster master to advice the cluster peer to roll the bucket.
new method (for every bucket):
curl -k -u username:password https://localhost:8089/services/cluster/master/control/control/roll-hot-buckets -X POST -d "bucket_id=_audit~2~1A3889D7-954B-4CE6-B071-01B438DE9865"
Hope this helps..
]]>A license violation will deactivate Splunk searches but not the indexing process. So you will not be able to query your data – but at least never loose it.
Typically a license warning is displayed in the web console of Splunk.
This warning is fine – but you want to get a notification using your normal monitoring and escalation process it’s simply not enough. For that reason I created a Powershell script which queries Splunk for the amount of indexed data and creates warningor critical events in your monitoring solution (e.g. Nagios)
As in the other monitoring articles for checking client versions and connections to Forwarder Management I’m using Splunk Powershell resource kit. Again – you will just need a Windows machine for executing the Powershell script – your Indexers could be running on non-Windows machines.
Setup monitoring using nsclient++ on Windows
Find the Download for the Script here.
Download and extract the files to C:\Program Files\NSClient++\scripts\splunk
Adjust your “C:\Program Files\NSClient++\nsclient.ini” and add the external script
[/settings/external scripts/scripts]
check_splunklicense = cmd /c echo scripts\\splunk\\check-license.ps1 -servername $ARG1$ -port $ARG2$ -username $ARG3$ -password $ARG4$ -warn $ARG5$ -critical $ARG6$; exit($lastexitcode) | powershell.exe -command –
define command{
command_name nt_nrpe_splunklicense
command_line /usr/lib/nagios/plugins/check_nrpe -t 30 -H $ARG1$ -p 5666 -c check_splunklicense -a $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$
}
define service{
use generic-service
host_name splunkindexer.bwlab.loc
service_description splunk license check splunk-2
check_command nt_nrpe_splunklicense!1.1.1.1!1.1.1.2!8089!admin!yourpassword!380!500
}
As you see at the command and service definition the first argument is the host where the Powershell script will be executed (1.1.1.1). The second and following arguments gives the Splunk indexer hostname (1.1.1.2) and credentials for login. The 380 and 500 pieces are the thresholds in MB for warning and critical triggers in Nagios.
Parameters
Here is a detailed list of the script parameters:
-servername
the servername or ip address to be checked – default localhost
-port
port of splunkd – default 8089
-protocol
protocol to use to communicate with splunkd – default: https
-timeout
connectiontimeout to splunkd in milliseconds – default 5000
-username
username to use to login to splunkd
-password
password to use with splunkd
-pool
licensepool to check – default “auto_generated_pool_download-trial” ..
freeversion is “auto_generated_pool_free”
-warn
warningvalue in Megabytes
-critical
critical value in Megabytes
-showpool
display all pools found on the indexer and usage. Values could be 0 (default: don’t display) or 1 (display)
If you are unsure which license pool to use check the -showpool parameter. It will display all license pools on the indexer and the used bytes.
if everything is setup correctly you will be honored with great check for your licensing and will never miss a warning again.
]]>Things to consider BEFORE upgrading
This is the first release which massively removed features from the product. If you are using:
. This update is not for you! Check out http://support.citrix.com/article/CTX137826 for further information. Second, you need a Windows Host for the Rolling Pool Upgrade. Install and update XenCenter before proceeding …
The easiest way to provide the installation media to XenServer 6.1 is NFS. Here you do not have to care for IIS MIME Type issues or FTP ascii/binary stuff. Just click & serve: Add the NFS Server Role, create a folder and copy the content of the downloaded XenServer-6.2.0-install-cd.iso from Xenserver.org there and create a NFS share. The first step is to add the NFS server role
Create the NFS Share. Default permissions are fine, since they will give read access to every host.
To be on the safe side you should check if the NFS share can be accessed. Run XenServer Console or login via SSH to XenServer poolmaster and mount the NFS share like in my example:
mkdir /mnt/nfstest
mount -t nfs NFSServer:/NFSSHARENAME /mnt/nfstest
ls -l /mnt/nfstest
umount /mnt/nfstest
Rolling Pool Upgrade Wizard slideshow..
Now it’s time to start XenCenter and run the wizard:
#
After relogon to XenCenter you want to verify the Buildnumber
]]>After some research I found the reported mRemoteNG Bug - https://mremoteng.atlassian.net/browse/MR-582. In the Comments somebody suggested to add the /LARGEADDRESSAWARE flag to the mremoteng.exe binary. Tried it, fixed it!
If you haven’t Visual Studio installed, here is a link to a modified mremote.exe Version 1.72.
It seems that every comparable Software like ASG/Visionapp Remote Desktop, Microsoft Remote Desktop Conection Manger (RDCMan) had the same issue in older versions.
]]>| stats count | eval ip=”193.28.153.192″ | lookup geoip clientip as ip
I got an error message, which showed that the lookup was somehow not working.
As the “geoip” lookup is implemented as a python script I checked the process using procmon..
As we see python.exe – which represents the lookup script located at c:\Program Files\Splunk\etc\apps\MAXMIND\bin\geoip.py – tries to read the Maxmind Database File GeoCityLite.dat and fails because the file is not where expected. In fact the database file is located at app folder c:\Program Files\Splunk\etc\apps\maps\bin\GeoLiteCity.dat, not Program folder c:\Program Files\Splunk\bin\GeoLiteCity.dat.
To fix the issue open the lookup script, uncomment line 5 and comment out line 6:
DB_PATH = os.path.join(os.environ["SPLUNK_HOME"], ‘etc’, ‘apps’, ‘MAXMIND’,'bin’,'GeoLiteCity.dat’)
#DB_PATH=(‘GeoLiteCity.dat’)
The same issue also applies to the Splunk Google Maps app. The command
| stats count | eval ip=”193.28.153.192″| lookup geo ip
returns error code 1 instead of a pin on the map.
you have to adjust the config file c:\Program Files\Splunk\etc\apps\maps\default\geoip.conf to
database_file = c:\Program Files\Splunk\etc\apps\maps\bin\GeoLiteCity.dat
The whole issue looks like a compatibility issue from Splunk 6.0 to 6.1. It seems that lookup scripts are executed in a different working directory.
]]>So if you have e.g. data missing in the MSSQL database you might want to know if the data is not collected from the agent (you might want to update it) or if the consolidation from the Firebird database to the MSSQL databas is an issue.
Unfortunately the process to connect to the local database is not well documented, so I’m doing this here.
The steps necessary are to setup a second instance of Firebird, attach the local database and browse the database using a GUI tool.
First you need to download the proper version of Firebird database server from http://www.firebirdsql.org. Make sure you download the newest version of the same major release Citrix is using for the agent. To get the major release check the file version of “C:\Program Files (x86)\Citrix\System Monitoring\Agent\Core\Firebird\bin\fbserver.exe”.
Now install the Firebird server with a next->next->next->finish. You should use the “Classic server binary” version.
Open services.msc and search for the “firebird – DefaultInstance” service to make sure Firebird is up and running. Btw.: you will find the Citrix Edgesight Firebird Service next to it, named “Firebird Server – CSMInstance”.
As you want to work with the database using a GUI, I suggest you install Flamerobin from http://flamerobin.org/. Just install the application – “next->next->next->finish.”
Now it’s time to stop your Edgesight and Citrix Firebird services on the Edgesight device.
Fire up Flamerobin and connect to the local newly installed Firebird instance. Choose Server->Register New Server
The running server is localhost and the TCP Port 3050.
Now you need to attach the Citrix Firebird database to this server. Choose “Register existing database…” and select the “RSDATR.FDB” database file.
Make sure to use the user sysdba and password masterkey (this is the default password of the new Firebird instance) to login.
That’s it, you’re done and can now browse and query all the tables, views and triggers of the local database.
]]>
$ESXHost = Get-VMHost esx.fqdn -ErrorAction:Stop
$dvSwitch = Get-VDSwitch -Name dvSwitch1 -Location virtualDatacenter -VMHost $ESXHost -ErrorAction:Stop
$Portgroup = Get-VDPortgroup -Name “Management Network” -VDSwitch $dvSwitch -ErrorAction:Stop
Get-VMHostNetworkAdapter -VMHost $ESXHost -VMKernel:$true -VirtualSwitch $dvSwitch -PortGroup $Portgroup -ErrorAction:SilentlyContinue | out-null
If the Host has no vmk NIC configured on the “Management Network” portgroup of dvSwitch1 the CMDlet still throws a error message. My workaround is a empty try/catch function to suppress the error message.
try {
Get-VMHostNetworkAdapter -VMHost $ESXHost -VMKernel:$true -VirtualSwitch $dvSwitch -PortGroup $Portgroup -ErrorAction:SilentlyContinue | out-null
}
catch {
#silence
}
Also for this bug a VMware SR is open. Let’s see if this bug is fixed in the next PowerCLI version.
The VMware versions in my development environment.
PowerCLI Version
—————-
VMware vSphere PowerCLI 5.5 Release 2 Patch 1 build 1931983
—————
Snapin Versions
—————
VMWare AutoDeploy PowerCLI Component 5.5 build 1890764
VMWare ImageBuilder PowerCLI Component 5.5 build 1890764
VMware vCloud Director PowerCLI Component 5.5 build 1649227
VMware License PowerCLI Component 5.5 build 1265954
VMware VDS PowerCLI Component 5.5 build 1926677
VMware vSphere PowerCLI Component 5.5 Patch 1 build 1926677
VMware vSphere Update Manager PowerCLI 5.5 build 1302474
—————
vSphere Versions
—————
vCenter 5.5 1891313
ESX 5.5 1331820
You can change the number of uplink ports from 2 to 4 without problems, but the configuration back to two uplinks silently fails.
Set-VDSwitch -VDSwitch $dvSwitchObject -NumUplinkPorts “4″ -Confirm:$false
The dvSwitch has 4 uplink ports after the configuration change.
Set-VDSwitch -VDSwitch $dvSwitchObject -NumUplinkPorts “2″ -Confirm:$false
The dvSwitch still has 4 uplink ports after this configuration change. The CMDlet finishes without an error, but the configuration is unchanged. Currently i have no automation workaround for that, but i can change the settings in the vSphere Web Client manually.
I have opened a VMware SR. Let’s see if this bug is fixed in the next PowerCLI version.
The VMware versions in my development environment.
PowerCLI Version
—————-
VMware vSphere PowerCLI 5.5 Release 2 Patch 1 build 1931983
—————
Snapin Versions
—————
VMWare AutoDeploy PowerCLI Component 5.5 build 1890764
VMWare ImageBuilder PowerCLI Component 5.5 build 1890764
VMware vCloud Director PowerCLI Component 5.5 build 1649227
VMware License PowerCLI Component 5.5 build 1265954
VMware VDS PowerCLI Component 5.5 build 1926677
VMware vSphere PowerCLI Component 5.5 Patch 1 build 1926677
VMware vSphere Update Manager PowerCLI 5.5 build 1302474
—————
vSphere Versions
—————
vCenter 5.5 1891313
ESX 5.5 1331820
]]>It is possible to switch “Failback” from true to false, but not from false to true. The CMDlet does not throw an error. The setting just stays the same. With the vSphere Client i’m able to change the failback option in both directions.
$dvSwitch = Get-VDSwitch -Name dvswitch1
Portgroup = Get-VDPortgroup -Name “NFS” -VDSwitch $dvSwitch
$Portgroup | Get-VDUplinkTeamingPolicy | Set-VDUplinkTeamingPolicy -FailBack:$true
The CMDlet works as expected, because after this change the failback option is enabled.
$dvSwitch = Get-VDSwitch -Name dvswitch1
Portgroup = Get-VDPortgroup -Name “NFS” -VDSwitch $dvSwitch
$Portgroup | Get-VDUplinkTeamingPolicy | Set-VDUplinkTeamingPolicy -FailBack:$false
The failback option is still enabled and the CMDlet does not throw an error.
Currently i have no workaround for it, but also for this bug a VMware SR is open.
The VMware versions in my development environment.
PowerCLI Version
—————-
VMware vSphere PowerCLI 5.5 Release 2 Patch 1 build 1931983
—————
Snapin Versions
—————
VMWare AutoDeploy PowerCLI Component 5.5 build 1890764
VMWare ImageBuilder PowerCLI Component 5.5 build 1890764
VMware vCloud Director PowerCLI Component 5.5 build 1649227
VMware License PowerCLI Component 5.5 build 1265954
VMware VDS PowerCLI Component 5.5 build 1926677
VMware vSphere PowerCLI Component 5.5 Patch 1 build 1926677
VMware vSphere Update Manager PowerCLI 5.5 build 1302474
—————
vSphere Versions
—————
vCenter 5.5 1891313
ESX 5.5 1331820
]]>First download the appropriate .deb package (32 bit or 64 bit) from from http://www.splunk.com/download/universalforwarder. Now you can create an unattended setup of Splunk Forwarder with a shell script like this (64 bit Forwarder is used).
#!/bin/bash
# install the package
dpkg -i splunkforwarder-5.0.1-143156-linux-2.6-amd64.deb
# accept EULA
/opt/splunkforwarder/bin/splunk start –answer-yes –no-prompt –accept-license
# change the adminpassword from changeme to Splunky
/opt/splunkforwarder/bin/splunk edit user admin -password Splunky -auth admin:changeme
# point the forwarder to forward all events to splunkserver
/opt/splunkforwarder/bin/splunk add forward-server splunkserver:9997 -auth admin:Splunky
# index and watch/monitor all files in /var/log
/opt/splunkforwarder/bin/splunk add monitor /var/log/ -auth admin:Splunky
Don’t forget to adjust the server name and port as well as the user and password to your Splunk Indexer installation.
In default installation Splunk forwarder is binding itself to all network interfaces (0.0.0.0). As this is not necessary and a security risk, you can reconfigure it in the file /opt/splunkforwarder/etc/splunk-launch.conf and add the following lines:
After this a restart of the Splunk daemon is necessary:
# bind splunk to localhost only
echo “# bind splunk to localhost only” >> /opt/splunkforwarder/etc/splunk-launch.conf
echo “SPLUNK_BINDIP=127.0.0.1″ >> /opt/splunkforwarder/etc/splunk-launch.conf
/opt/splunkforwarder/bin/splunk restart
Create the initscripts for startup
/opt/splunkforwarder/bin/splunk enable boot-start
screenshot | description |
---|---|
Configure a generic Scoreboard with the option to round by two Decimals. | |
The generic Scoreboard throws a NumberFormatExeption before the data is displayed. | |
To workaround the error add wrapper.java.additional.22 = -Duser.language=en option to the Tomcat Wrapper config located in %ALIVE_BASE%/user/conf/tomcat/wrapper.conf. |
|
Restart the vcopsWebService to enable the new setting. net stop vCOpsWebService && net start vCOpsWebService |
|
Restart the vcopsWebService to enable the new setting. net stop vCOpsWebService && net start vCOpsWebService |
Update: VMware has published the KB article 2058431 with a official description of the issue.
]]>With that SQL query you can workaround that problem.
use vcops
select dateadd(SECOND, convert(bigint, StartTimeUTC) / 1000, convert(datetime, '1-1-1970 02:00:00')) as Date, Name, MessageInfo
FROM Alarm INNER JOIN AliveResource ON Alarm.RESOURCE_ID = AliveResource.RESOURCE_ID
WHERE Alarm.CancelTimeUTC IS NOT null
AND Alarm.AlarmType = 12
AND Alarm.AlarmLevel = 2
AND AliveResource.RESKND_ID = 20
order by Date desc
;
Number | Description | |
---|---|---|
AlarmType | 12 | Fault |
AlarmLevel | 2 | Warning |
AlarmLevel | 4 | Critical |
RESKND_ID | 18* | vCenter* |
RESKND_ID | 20* | ESX Host* |
I checked my scripts to see how i solve this problem and i compared it with the default GET-VM query.
This is a short query with code easy to understand and maintain:
Get-Datacenter -Name “DCNAME” | Get-VM | Where-Object {$_.PowerState -eq “poweredOn”}
get-view -ViewType VirtualMachine -Filter @{“Runtime.PowerState”=”poweredOn”;”Config.Template”=”false”} -SearchRoot $(get-view -ViewType Datacenter -Property Name -Filter @{“Name” = “^DCNAME$”}| select -ExpandProperty MoRef)
get-view -ViewType VirtualMachine -Property Name,Summary -Filter @{“Runtime.PowerState”=”poweredOn”;”Config.Template”=”false”} -SearchRoot $(get-view -ViewType Datacenter -Property Name -Filter @{“Name” = “^DCNAME$”}| select -ExpandProperty MoRef)
I have compared the runtime of the queries in one of the larger vCenter installations. It is interesting to see that GET-VM is faster than GET-VIEW without filters applied. VMware seems to have optimized the GET-VM CMDlet in the current PowerCLI version (5.1 Release 2 build 1012425).
Get-VM | GET-VIEW | GET-VIEW with a property filter | |
---|---|---|---|
# virtual Machines | 1159 | 1159 | 1159 |
Runtime | 23.08s | 40.53s | 6.29s |
Runtime per VM | 0,017s | 0,035s | 0,005s |
Speed up | -56% | 366% |
The Module provides the CMDlets Get-vCOpsResourceAttributes, Get-vCOpsDBQuery and Get-vCOpsResourceMetric. I use Get-vCOpsResourceMetric in a customer project to fetch the “Active Memory” of virtual machines to calculate new RAM reservation values. The Module has saved me a lot of development time, but the runtime is too high if you have more than a few VM’s.
The Get-vCOpsResourceMetric CMDlet uses several Arrays to process the metric data that has been fetched from vCenter Operations. I have replaced that arrays with LinkedLists, because if you have many entries in the arrays they are really slow in Powershell/.net. Since Luc Dekens has provided that hint in his PowerCLI session during VMWorld 2011, i use it in many scripts to raise the performance of my scripts. You can find some additional infos on James Brundage’s page start-automating.com. I also moved the creation of PSobject that contains the final data structure out of the inner foreach loop and i have removed the date conversion from the CMDlet. Especially the date conversion from Unix time stamp to a local date format is very time consuming. The PSoject still includes the timestamp and you can make the conversion in a later step if you need that for some metrics. You can download the updated version of the ps_vcops.psm1 Module. Maybe Clint and Alan can update the original version to include the changes and provide a option to enable the date conversion.
original script | optimized script with date conversion | optimized script without date conversion | |
---|---|---|---|
CMDlet | Get-vCOpsResourceMetric | Get-vCOpsResourceMetricoptimizeddateincluded | Get-vCOpsResourceMetricoptimized |
Metric | mem|active_average | mem|active_average | mem|active_average |
Timeframe | 180 days | 180 days | 180 days |
# Values | 30557 | 30555 | 30555 |
Runtime | 0h 4m 54s | 0h 3m 15s | 0h 0m 28s |
Runtime per Value | 0.009631s | 0.006371s | 0.000916s |
Speed up | 151,16% | 1051,41% |
I have used that calls to run the CMDlets and they provide the data shown in the screenshots:
$VM | Get-vCOpsResourceMetric -metricKey “mem|active_average” -startDate (Get-Date).AddHours(-4320) -includeDt:$false -includeSmooth:$false
$VM | Get-vCOpsResourceMetricoptimizeddateincluded -metricKey “mem|active_average” -startDate (Get-Date).AddHours(-4320) -includeDt:$false -includeSmooth:$false
$VM | Get-vCOpsResourceMetricoptimized -metricKey “mem|active_average” -startDate (Get-Date).AddHours(-4320) -includeDt:$false -includeSmooth:$false
If you have more than a few VM’s in your environment or you want to use a big timeframe for your vCOps metrics this optimization gives you big time savings. For more information about the usage of the HttpPostAdapter open the documentation url [https://vcopshost/HttpPostAdapter] of your vCOps installation.
Thanks to Clint and Alan for developing that Module!
]]>The last days I was heavily using my WiFi. From now and then I had some network issues – but I suspected the WiFi network I was connected to – as we all know about the quality of hotel WiFis. Furthermore I never had WiFi issues before.
Today when I was sitting in an office with public WiFi – during the first working hour everything was running fine. Later I got massive packet loss over the WiFi. A ping to the router and google DNS was looking like this:
First I also suspected a WiFi issue, switched to my MiFi device and after 1 hour connection issues occurred as well.
The packet losses occurred very sporadic – sometimes after 5 minutes, sometimes it was running fine for half an hour or more. Disconnecting and reconnecting to the WiFi seemed to fix the issue for some time.
I verified that my notebook is really the culprit by running a ping application from my iPhone.
The Intel WiFi driver for Centrio Advanced-N 6235 looked pretty recent but I remembered I did an update using “Easy Software Manager” round about 3 weeks ago.. so I decided to roll back to the previously installed driver.
After Rolling back version 15.5.6.48 instantly fixed the network issues.
When searching for the driver version I found a thread at the Intel forums confirming that the network driver seems to have issues.
http://communities.intel.com/message/188548
As the Centrio Advanced-N 6235 seems to be used in multiple Samsung series other models seems to be effected as well. For now it seems not sure if the Problem is Windows8 only.
Conclusion: Intel seems not to be interested to fix the problem and Samsung is still(!) deploying a crappy driver with their Software.. be aware – make sure not to run the recent driver version!
Update: After I did further tests it seems that driver version 15.5.6.48 is quite stable but having latency issues. As I am sick&tired and need a working solution I decided to buy a 15EUR mini USB dongle. Bye Bye Intel WiFI!!!
]]>As 15-30 seconds are quite often TCP timeouts we did a network trace for further analysis. The analysis showed that while the RDP client hung at “Securing remote connection…”, it tried to access ctldl.windowsupdate.com.
Note – dear network admin: This is a classic example of bad network design. The client was located in an isolated network but was able to lookup public targets and tried to access one of them. Because your IP firewall is dropping packages instead of rejecting them, the client will never get a notification that a connection could not be established and instead wait until the timeout is reached.
So if your network does allow lookups to external resources, and there are NO good reasons to do so, make sure to reject connections – at least from your own network – instead of dropping them. For maximized security strictly disable external DNS in isolated networks to avoid DNS tunnel attacks.
Note –the response from the network admin will be: This is a classic example of blaming the network instead of the application. Make sure to configure your OS and application correctly to avoid unnecessary network connections, disable automatic updates and things will be fine.
Well, as only Vista and higher targets shows the hangs we suspected an Root CA updating issue, found http://support.microsoft.com/kb/2677070 and disabled network retrieval.
Updates disabled, timeouts prohibited, mission accomplished – go home early.
]]>I recently added a local SSD to my XenServer to check out Intellicache. The SSD was added in addition to the local HDD so it became device /dev/sdb. First I had to remove all existing partitions using fdisk.
[root@xenserver01 ~]# fdisk /dev/sdb
The number of cylinders for this disk is set to 19457.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): p
Disk /dev/sdb: 256.0 GB, 256060514304 bytes
255 heads, 63 sectors/track, 19457 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 * 1 13 102400 7 HPFS/NTFS
Partition 1 does not end on cylinder boundary.
/dev/sdb2 13 19458 156185600 7 HPFS/NTFS
Command (m for help): d
Partition number (1-4): 1
Command (m for help): d
Selected partition 2
Command (m for help): p
Disk /dev/sdb: 256.0 GB, 256060514304 bytes
255 heads, 63 sectors/track, 31130 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
After this i checked the SSD for partitions using “p”, found /dev/sdb1 /dev/sdb2 and deleted them using “d” 1,2. I verified that both partitions are deleted using “p” again. Next step is to create a default linux partition with max size (you have to use EXT3 filesystem as local SR for Intellicache) using “n”. Finally I had to write the partition table back to disk and quit using “w” and “q”.
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-31130, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-31130, default 31130):
Using default value 31130
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
Now I had to introduce the newly created SSD partition (/dev/sdb1) to XenServer using the xe command.
xe sr-create host=xenserver01 content-type=user type=ext device-config:device=/dev/sdb1 shared=false name-label="localssd"
I got back the UUID of the newly created local SR and it occured XenCenter. In this example the UUID is “ad918078-aca8-2a76-81fd-7bbc4b2ba462″
Next I had to configure the Storage Repository as Intellicache. Before I disabled (enable maintainance mode) XenServer.
[root@xenserver01 ~]# xe host-disable host=xenserver01
[root@xenserver01 ~]# xe host-disable-local-storage-caching host=xenserver01
[root@xenserver01 ~]# xe host-enable-local-storage-caching host=xenserver01 sr-uuid=ad918078-aca8-2a76-81fd-7bbc4b2ba462
This did the job – now I could verify the configuration by listing the host properties using “xe host-param-list”. This showed the configuration of the local SR configured for Intellicache “local-cache-sr”. To double-check if Intellicache functionality is enabled on the SR “xe sr-param-list” could be used. Check for the correct value of “local-cache-enabled value”.
[root@xenserver01 ~]# xe host-param-list uuid=c0251040-2ca1-4052-8f2e-aa764ce827e3
local-cache-sr ( RO): ad918078-aca8-2a76-81fd-7bbc4b2ba462
[root@xenserver01 ~]# xe sr-param-list uuid=ad918078-aca8-2a76-81fd-7bbc4b2ba462 | grep -i local-cache-enabled
local-cache-enabled ( RO): true
]]>Just create a new super metric with the formula “co-stop/provisioned CPU cores/20000*100″ and add it to a super metric package as described in my Using Super Metrics to monitor CPU %READY Part 2 article.
Create a new dashboard, add the heatmap and metric graph widget to it and configure both. Details on these steps you can find here.
The heatmap shows you all the virtual machines with co-stop problems at a glance.
]]>How VMware does it, how we workaround with XenServer
VMware vSphere can inject a command from hypervisor into the VM. VMware Tools needs to be started for that.
So at vSphere they are cloning the VM, create a VM-specific sysprep file from specification and inject that file with help of VMware Tools. After that they start the sysprep process and reboot. The “invoke-vmscript” cmdlet from powercli is the powershell implementation of passing the command to the VM.
Easy, simple, reliable but useless for XenServer as there is no possibility to pass a command from the hypervisor to a VM. But even though there is no possibility to prepare and start a sysprep from the hypervisor we can trigger the process from “outside” – as long as there is a network connection available.
Planning the implementation.
Here is our workaround: we generate and trigger sysprep from a “helper machine”. This host needs to have direct network access and credentials to XenServer and to the target VM.
I created a PowerShell script which handles the sysprep from your helper machine. In detail it does the following steps:
That’s it: your target VM will reboot now, will autologin with the credentials you supplied and start sysprep. Sysprep will trigger the next reboot and customize everything you prepared.
** Setup example **
Choose a windows machine running at least powershell 2 (could be your workstation). In addition you’ll need to have the XenServer PowerShell CmdLet installed.
Now you can just start you powershell x86 and run the osc.ps1 passing parameters hypervisor.xml and examplevm01.xml to it.
C:\Windows\SysWOW64\WINDOWSPOWERSHELL\V1.0\powershell.exe -file .\osc.ps1 -hypervisorconfig xenserver01.xml -clonevm example01.xml
Summary
As you see it’s possible to add OS customizing functionality with XenServer – even if it’s some work. You’ll need such functionality if you want to create an automated VM deployment.
The mentioned osc.ps1 and configurationfiles are available for download.
]]>First, the webpage didn’t offer the Access gateway Client for download and installation when I logged in. Strange, but I simply copied the installer file from my Netscaler’s /var/netscaler/gui/vpns/scripts/vista/nsvpnc_setup64.exe to my client and installed it manually. However, after a relogin to Netscaler and starting the VPN, my client tried to start the Java VPN client (fallback scenario).
To work around this issue just set the user agent of you Internet Explorer lower than IE10. Start developer tools with F12 and select Tools->set user agent string-> Internet Explorer 9.
Although Citrix updates Netscaler releases frequently, it seems nobody participated at the Windows8 betas. /* no comment */
]]>In the Alerts Overview you can browse through the list of the generated alerts. In the tree on the left side you can filter by resources like ESX hosts or datastores. On the top left side you have the search form. This search only filters by the “resource name” column, but not by the info column. In this example it means you cannot filter by “Resource is down”.
You can run sql queries against the vcops tables. In this example i use a MS SQL server. The other by vCOps supported database is Oracle. The select queries alerts that match the shown Alert.Info text entries. The results can then be exported into a csv file for further reports.
use vcops
select dateadd(SECOND, convert(bigint, StartTimeUTC) / 1000, convert(datetime, '1-1-1970 02:00:00')) as Date, Name, Info
FROM Alert INNER JOIN AliveResource ON Alert.RESOURCE_ID = AliveResource.RESOURCE_ID
WHERE Alert.Info LIKE 'Lost Connection to NFS server%'
OR Alert.Info LIKE 'A possible host failure has been detected by HA on host%'
order by Date desc
;
The second example queries alarms (events) that match the shown Alarm.MessageInfo ‘Connection failed for %’ text entries and the AliveResource.RESKND_ID 20. This number is the internal ID for the resource ESX Host.
use vcops
select dateadd(SECOND, convert(bigint, StartTimeUTC) / 1000, convert(datetime, '1-1-1970 02:00:00')) as Date, Name, MessageInfo
FROM Alarm INNER JOIN AliveResource ON Alarm.RESOURCE_ID = AliveResource.RESOURCE_ID
WHERE Alarm.MessageInfo LIKE 'Connection failed for %'
and AliveResource.RESKND_ID = 20
order by Date desc
;
I have raised a vCOps feature request for filtering by info column. Until VMware provides that feature you can use that queries as a workaround.
]]>The vCenter installer provides the option to change the connection ports of the vCenter services. In my example i have changed the http port to 82. You will receive a warning if you do this, but later the installer forgets it’s own warning.
After the installation you will SMS and SPS service health errors in the vCenter Service Status window.
Go into the SMS configuration folder to add the port to the url. In a default installation it is
C:\Program Files\VMware\Infrastructure\VirtualCenter Server\extensions\com.vmware.vim.sms
Open the extension.xml with an editor and add the custom port to the SMS service health url.
http://localhost/sms/health.xml to http://localhost:82/sms/health.xml
Go into the SPS configuration folder to add the port to the url. In a default installation it is
C:\Program Files\VMware\Infrastructure\VirtualCenter Server\extensions\com.vmware.vim.sps
Open the extension.xml with an editor and add the custom port to the SPS service health url.
http://localhost/sps/health.xml to http://localhost:82/sps/health.xml
Restart the vCenter services or reboot the vCenter machine.
net stop "VMware vSphere Profile-Driven Storage Service"
net stop "vCenter Inventory Service"
net stop "VMware VirtualCenter Management Webservices"
net stop "VMware VirtualCenter Server"
net stop "VMwareVCMSDS"
net start "VMwareVCMSDS"
net start "VMware VirtualCenter Server"
net start "VMware VirtualCenter Management Webservices"
net start "vCenter Inventory Service"
net start "VMware vSphere Profile-Driven Storage Service"
Check the vCenter Service Status window for the solved errors. All checks now should be green.
Let’s see if this problem is fixed in vSphere 5.1. I will do another test if 5.1 has been released…
2012/09/11 Update:
Today i have tested the vSphere 5.1 release to see if the problem is solved. As the screenshot shows. In a new installation of vSphere 5.1 the service status problems with a custom http port are gone.
http://localhost:8080/sms/health.xml and http://localhost:21200/sps/health.xml
are now configured by the installer.
2014/09/24 Update:
The Port bug is back for the SMS component in vCenter 5.5 U2. Maybe other versions between 5.1 and 5.5 U2 are also affected.
]]>The vSphere client performance tab shows the CPU wait time in ms for all vCPUs of a virtual machine. vSphere 5 also shows you the sum of all vCPU wait times, but you are on your own to calculate how many percent of the time the vCPUs have been waiting.
With this formula vCOps takes the CPU Wait time metric divides it by the number of vCPUs, divides that by 20.000ms and by 100 to give you the percentage value.
(((sum($This:A518)/sum($This:A521))/20000)*100)
After a few minutes the metric graph displays the first percent values. Like every other metric you can use also this one in Heatmaps or other widgets, you can define KPI’s, hard thresholds and alerts for it.
In my post Using Super Metrics… you can find some background on how to configure and use super metrics.
]]><VirtualHost www.external.fqdn:443>
<IfModule mod_proxy.c>
ProxyRequests Off
SSLProxyEngine On
ProxyPreserveHost On
</IfModule>
<Location /splunk>
<IfModule mod_proxy.c>
ProxyPass https://1.2.3.4:8000/splunk retry=0
ProxyPassReverse retry=0
</IfModule>
</Location>
</VirtualHost>
additional Documentation: mod_proxy – Apache HTTP Server
Splunk configuration in Splunk\etc\system\local\web.conf
[settings]
enableSplunkWebSSL = 1
root_endpoint = /splunk
tools.proxy.on = True
additional Documentation
This setup is not perfect, because it needs SSL to be enabled on the Splunk web frontend. Maybe it is possible to use mod_rewrite to change the URLs between https and http.
As a conclusion some Splunk screenshots. I’m sure Andreas will come up with a few in depth posts about Splunk and monitoring XenApp environments. Collecting log data from vCenter and the ESX hosts is another great use case for Splunk…
]]>
Start the Mac OS Automator to create a new folder action. A folder action we be started if something changes inside a folder. If you plug in a USB flash drive, a folder with the Name of the drive will be created under /Volumes and the drive will be mounted under this directory. This triggers the folder action we will create now
Choose to create a new folder action in the new document dialog.
Open the folder dialog to select the /Volumes folder as the trigger for the folder action.
Because /Volumes is not visible in the finder and in the file dialogs you must use
Shift + Command + G
to open the go to folder dialog.
Now the Volumes folder shows up in the file dialog.
Input the bash script that will run rsync if the name of the USB flash drive matches on of the names in the USBNAMES array.
If you use Mountain Lion you can add a automator action for the Notification Center after the bash script that will run rsync if the name of the USB flash drive matches on of the names in the USBNAMES array. The automator action can be downloaded at [automatedworkflows.com](http://automatedworkflows.com.
USBNAMES=( BBO-4GB BBO-8GB )
RSYNC=/usr/bin/rsync
RSYNCOPT=-avh
BACKUPFOLDER=~/USBBACKUP
MOUNTFOLDER=/Volumes
MOUNTS=( $MOUNTFOLDER/* )
## create backup folder if missing
if [ ! -d $BACKUPFOLDER ]
then
mkdir -p $BACKUPFOLDER
fi
for folder in $MOUNTS
do
for name in "${USBNAMES[@]}"
do
if [ $folder == "$MOUNTFOLDER/$name" ]
then
$RSYNC $RSYNCOPT $folder $BACKUPFOLDER --log-file=$BACKUPFOLDER/$name.log
fi
done
done
Save the folder action and give it a meaningful name.
If you connect the USB flash drive the workflow icon will show up in the status bar and it will disappear if the script has been finished.
After the workflow has finished you should see the new USBBACKUP folder in your home directory, the backup folder for the flash drive and the rsync logfile for that drive.
It is also possible to use Growl notifications to report the status of the workflow, but i used the Mountain Lion notifications after i have upgraded my macbook.
]]>Heatmaps can display the current status of many objects (even thousands) in one clear view. It gives you a quick answer for which datastores have a high latency, which cluster have the most ballooning virtual machines, which ESX hosts are currently not available in which cluster and so on. This screenshot displays a heatmap configuration that shows all virtual machines in a vCenter, how many vCPUs they have configured (the size of the square) and how high their cpu ready % values are. If the values are higher than 5% the color changes from green to red (10% or more). As i have described in part 3 of “using super metrics…” you can create interactions between the widgets to “handover” the a metric from the selected object to another widget. If you click on a VM in the heatmap the “ready %” metric will be drawn in the metric graph and also in the data distribution widget if you configure both.
Metric graphs draw a metric for one or more objects over a configured timeframe. They are good to understand the history and development of an object. For instance how high was the cpu usage of a ESX host or cluster a week ago, how fast does a datastore fill up with vmdk’s, how many lost packets had that network interface yesterday and so on. I have selected two VMs with different % ready values from the heatmap. In the metric graph i have combined both metrics, but it also possible to draw multiple graphs in one widget. This is useful if you have many metrics to compare or if you would zoom in a single graph. The screenshot shows a different history for the both virtual machines. The blue VM has very high values since august the ninth. Both run in the same vCenter, but maybe on different ESX hosts. The graph gives you an indication of what could be the problem. You can drill down into the virtual machine to see on which host it runs, check if the host has problems and if other VMs on that host are also affected.
Data distribution widgets are more complex and harder to read, but they display how the values are distributed even over two timeframes. With them you can answer what was the cpu usage for a virtual machine most of the time in the last seven days and also for the last 30 days. The same is possible for IOps or latency of a datastore or the availability of a ESX host. Because i have selected the two virtual machines in the heatmap and i also have configured interactions for metric graphs and data distribution widgets, they will show up also in this widget. The screenshot shows the graph for both VMs and the configuration for seven and 30 days. The virtual machine on top has most of the ready % values between 20-25% in the last seven days and between 0-5% plus 20-25% in the last 30 days. This virtual machine has more problems in the last seven days then in the weeks before. If you compare it with metric graph you will see that this is the blue VM. The other VM has slightly better values in the last seven days than in the last 30 days and in that case it is more visible than in the metric graph. The Y axis in the graphs shows the percentage for the distribution of the values. For the top VM it means nearly 50% of the time the ready values in the last seven days were around 23.5% (the blue spike).
I think the vCenter Operations widgets prove that data visualization is very important and helpful to give educated answers about the health of your virtual infrastructure. With the integration of more data sources and automatic relationship configuration vCOps becomes even more valuable.
]]>To create an unattended installation job just download the viclient.exe from the vCenter Server (https://vcenterserver) and start the binary. This extracts all MSI sources to your %TEMP% directory. Now you can copy this directory and create an installation batch which is executing an msiexec:
set src=%~dp0
start /wait msiexec /i “%src%extract\VMware vSphere Client 5.0.msi” /l*v “%temp%\VMware vSphere Client 5.0.log” /qb-
When you use this installation-batch for testing purposes on a local machine, everything runs smoothly and you get a clean installation. In the next step I tried to sequence this installation-batch to get an App-V package. To sequence you have to start the App-V Sequener gui and create a new package with a custom installation path or run the sequencer (like me) from the commandline:
“C:\Program Files (x86)\Microsoft Application Virtualization Sequencer x64\SFTSequencer.com” /PACKAGENAME:”vsphere-client5″ /INSTALLPACKAGE:”C:\temp\vsphere-client5\install.cmd” /INSTALLPATH:”Q:\vsphere-client5″ /OUTPUTFILE:”C:\app-v\vsphere-client5\vsphere-client5.sprj”
The install.cmd is the installation-batch mentioned before. When you now start the vSphere Client from the package and connect to a vSphere5 Server it tells you to update your vSphere Client. Remember: the same installation-batch without App-V was running fine. When I checked the startup-link for the client it looked like this:
As you see the vSphere Client is mentioned as version 4.1*, but comparing this to the local installation the correct version of “C:\Program Files (x86)\VMware\Infrastructure\Virtual Infrastructure Client\Launcher\vpxclient.exe” should be something like 5.0.*.
This proves that something is going wrong during the sequencing process. After some research on the web I found a Technet Discussion which seems to pinpoint the root cause: the vSphere Client installer is trying to create a proprietary USB Service and start it. This service runs natively on a system, but not within the App-V sequencing sandbox. The installer seems to check if the service has been started successfully and otherwise falls back to an old version.
As descripted in the discussion I created an MST and adjust my installationbatch:
start /wait msiexec /i “%src%extract\VMware vSphere Client 5.0.msi” TRANSFORMS=”%src%extract\without-usb.mst” /l*v “%temp%\VMware vSphere Client 5.0.log” /qb-
This will disable the USB service during the setup and ends up with the correct link:
You could create the mentioned MST yourself or download it here.
which seems to
pinpoint the cause.
]]>My favorite troubleshooting tool for these scenarios is netio123. It gives you a basic idea how much throughput you could get between two endpoints. It implements a server and a client in the same binary and is available for multiple OSs. Small, easy and straightforward…
In the first run I tested 2 windows VMs. The command “netio -t -p 5000 -s” will start the netio server on one VM on TCP/5000. To start the test from the client run “netio.exe -t -p 5000
C:\temp\netio123\bin>netio.exe -t -p 5000 192.168.15.73
NETIO – Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel
TCP connection established.
Packet size 1k bytes: 8419 KByte/s Tx, 9423 KByte/s Rx.
Packet size 2k bytes: 8236 KByte/s Tx, 8851 KByte/s Rx.
Packet size 4k bytes: 16263 KByte/s Tx, 18310 KByte/s Rx.
Packet size 8k bytes: 32720 KByte/s Tx, 33743 KByte/s Rx.
Packet size 16k bytes: 63435 KByte/s Tx, 65853 KByte/s Rx.
Packet size 32k bytes: 116217 KByte/s Tx, 121351 KByte/s Rx.
As you see by the results i got less than 10MByte/s for small packages. Seems quite slow for a virtual network on one host. I fired up 2 LinuxVMs and rerun the test:
root@squeeze2:~/netio123/bin# ./linux-i386 -t -p 5000 192.168.15.78
NETIO – Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel
TCP connection established.
Packet size 1k bytes: 331380 KByte/s Tx, 337781 KByte/s Rx.
Packet size 2k bytes: 352727 KByte/s Tx, 344394 KByte/s Rx.
Packet size 4k bytes: 324983 KByte/s Tx, 325345 KByte/s Rx.
Packet size 8k bytes: 332496 KByte/s Tx, 328502 KByte/s Rx.
Packet size 16k bytes: 348690 KByte/s Tx, 357080 KByte/s Rx.
Packet size 32k bytes: 369076 KByte/s Tx, 356453 KByte/s Rx.
I constantly got 300MByte/s even for small packages. This seems to be fine. So what happens when i run the test with a mixed Windows/Linux environment?
C:\temp\netio123\bin>netio.exe -t -p 5000 192.168.15.78
NETIO – Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel
TCP connection established.
Packet size 1k bytes: 108210 KByte/s Tx, 170729 KByte/s Rx.
Packet size 2k bytes: 129609 KByte/s Tx, 173783 KByte/s Rx.
Packet size 4k bytes: 219375 KByte/s Tx, 202754 KByte/s Rx.
Packet size 8k bytes: 280427 KByte/s Tx, 205357 KByte/s Rx.
Packet size 16k bytes: 283376 KByte/s Tx, 206990 KByte/s Rx.
Packet size 32k bytes: 239445 KByte/s Tx, 206456 KByte/s Rx.
Wow, impressive 100MByte/s even for small packages. That’s 10x compared to WinVM-WinVM. Now it’s quite sure that the network performance issues are just related to the Windows VMs itself – and only when connecting to other Windows VMs. So Windows implements enhanced TCP features which you can display with the netsh command (e.g.: “netsh int tcp show global”). One basic troubleshooting step is to disable all these features and rerun the test. For disabling certain features run the following commands: “netsh int tcp set global chimney=disabled” for disable chimney, “netsh int tcp set global rss=disabled” for disable receive seite scaling… The breakthrough in my case was to disable autotuning by
netsh interface tcp set global autotuninglevel=disabled
C:\temp\netio123\bin>netio.exe -t -p 5000 192.168.15.73
NETIO – Network Throughput Benchmark, Version 1.26
(C) 1997-2005 Kai Uwe Rommel
TCP connection established.
Packet size 1k bytes: 94664 KByte/s Tx, 98326 KByte/s Rx.
Packet size 2k bytes: 99216 KByte/s Tx, 102267 KByte/s Rx.
Packet size 4k bytes: 99947 KByte/s Tx, 105185 KByte/s Rx.
Packet size 8k bytes: 98054 KByte/s Tx, 99594 KByte/s Rx.
Packet size 16k bytes: 99172 KByte/s Tx, 103267 KByte/s Rx.
Packet size 32k bytes: 98967 KByte/s Tx, 102743 KByte/s Rx.
This shows a 10x performance boost for small packages immediately. The sluggish copy jobs are now performing well. So, always make sure your VM network is running exactly. The small tool netio123 (you’ll easily find this on the web) can help you with some basic tests.
So, always make sure your VM network is running fine. The small tools netio123 (you’ll easiely find this on the web) could help you with some basic tests.
]]>Further investigation showed that the Server was accesible at TCP/443 and the client was connecting correctly, no hints in XenCenter or XenServer logs. I encountered the same issues with the XAPI Tool xe.exe and powershell commandlets.
I found it even more interesting that I could connect with XenCenter from other clients. The clients causing problems were VMs running inside the XenServer itself, while the other clients – as my local workstation – were connecting without complications. When starting to solve the problem at VM level, I noticed that the problem just starts after XenServer tools were installed.
I had a look in the vNIC driver settings (XenServer tools are updating these drivers at setup) and disabled Large Receive Offload (IPv4) setting. Et Voilà – issue solved!
This issue is related to XenServer Tools 6.0.2.
]]>
The powershell error message told me “the Windows Powershell snap-in ‘XenServerPSSnapIn’ is not installed on this computer”
The explanation for this is quite easy: the XenServer powershell cmdlet’s are not(!) available for x64 platform. The batch file starting the XenServer Powershell Snapin starts powershell.exe using the C:\windows\system32… directory. Although this is valid on x86 platforms the directory has to be fixed on 64Bit platforms.
Simply Change C:\windows\system32\windowspowershell.. to C:\windows\SysWow64\windowspowershel… in “C:\Program Files (x86)\Citrix\XenServerPSSnapIn\XenServerPSSnapIn.bat”.
Hopefully Citrix will adjust the installer to check for the correct platform in future releases.
]]>Table of Contents
If you have configured more than one datacenter within a vCenter and you also have configured the portgroups with the same name on dvSwitches the cmdlet will return both portgroups. The reason is that the cmdlet has no option to filter by dvSwitch.
original code
Function Get-DistributedSwitchPortGroup
{
<#
.SYNOPSIS
Get Distributed Virtual Port Groups (DVPG) by name or vDS.
.DESCRIPTION
Get Distributed Virtual Port Groups (DVPG) by name or vDS.
.PARAMETER Name
Name of the DVPG to retrieve supports wildcards.
.PARAMETER DistributedSwitch
Name of vDS to retrive the DVPG for.
.EXAMPLE
Get-DistributedSwitchPortGroup -Name PG02
.EXAMPLE
Get-DistributedSwitchPortGroup -DistributedSwitch vDS01
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$false
, ValueFromPipelineByPropertyName=$true
, ValueFromPipeline=$true)]
[String]
$NAME
, [Parameter(Mandatory=$false
, ValueFromPipelineByPropertyName=$true)]
[String]
$DistributedSwitch
)
Begin
{
$extraparams=@{}
$extraparams["Property"] = @(
'Name'
, 'Config.Description'
, 'Config.Type'
, 'Config.DefaultPortConfig'
, 'Config.DistributedVirtualSwitch'
, 'VM'
, 'PortKeys'
, 'AlarmActionsEnabled'
)
IF ($DistributedSwitch)
{
$vDSMoRef = Get-view -Property 'moref' `
-ViewType "VmwareDistributedVirtualSwitch" `
-filter @{'Name'=$DistributedSwitch}`
-verbose:$false |
Select-Object -ExpandProperty MoRef
Select-Object -ExpandProperty Value
If ($Name)
{
$extraparams["filter"] = @{
'Name'=$Name
'Config.DistributedVirtualSwitch'="VmwareDistributedVirtualSwitch-$($vDSMoRef)"
}
}
Else
{
$extraparams["filter"] = @{
'Config.DistributedVirtualSwitch'="VmwareDistributedVirtualSwitch-$($vDSMoRef)"
}
}
}
If ($Name)
{
$extraparams["filter"] = @{'Name'=$Name}
}
}
Process
{
get-view -ViewType "DistributedVirtualPortgroup" -verbose:$false @extraparams |
Select-Object @{
Name='Name'
Expression={$_.Name}
},
@{
Name='Description'
Expression={$_.Config.Description}
},
@{
Name='PortBinding'
Expression={$_.Config.Type}
},
@{
Name='VLANID'
Expression={(($_.Config.DefaultPortConfig.Vlan.VlanId|%{
if ($_ -match "\d+") {$_}
elseIf ($_.Start -eq $_.End) {$_.Start}
Else {"{0}-{1}" -f $_.Start,$_.End}}) -join ",")}
},
@{
Name='NumbOfVMs'
Expression={$_.Vm.count}
},@{
Name='NumofPorts'
Expression={$_.PortKeys.count}
},
@{
Name='AlarmActions'
Expression={$_.AlarmActionsEnabled}
},
@{
Name='DistributedSwitch'
Expression={ Get-View $_.Config.DistributedVirtualSwitch `
-Property Name -verbose:$false |
Select-Object -ExpandProperty Name}
},
@{
Name='MoRef'
Expression={ $_.MoRef}
}
}
}
modified code
Function Get-DistributedSwitchPortGroup
{
<#
.SYNOPSIS
Get Distributed Virtual Port Groups (DVPG) by name or vDS.
.DESCRIPTION
Get Distributed Virtual Port Groups (DVPG) by name or vDS.
.PARAMETER Name
Name of the DVPG to retrieve supports wildcards.
.PARAMETER DistributedSwitch
Name of vDS to retrive the DVPG for.
.EXAMPLE
Get-DistributedSwitchPortGroup -Name PG02
.EXAMPLE
Get-DistributedSwitchPortGroup -DistributedSwitch vDS01
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$false
, ValueFromPipelineByPropertyName=$true
, ValueFromPipeline=$true)]
[String]
$NAME
, [Parameter(Mandatory=$false
, ValueFromPipelineByPropertyName=$true)]
[String]
$DistributedSwitch
)
Begin
{
$extraparams=@{}
$extraparams["Property"] = @(
'Name'
, 'Config.Description'
, 'Config.Type'
, 'Config.DefaultPortConfig'
, 'Config.DistributedVirtualSwitch'
, 'VM'
, 'PortKeys'
, 'AlarmActionsEnabled'
)
IF ($DistributedSwitch)
{
$vDSMoRef = Get-view -Property Name `
-ViewType "VmwareDistributedVirtualSwitch" `
-filter @{'Name'=$DistributedSwitch}`
-verbose:$false |
Select-Object -ExpandProperty MoRef|Select-Object -ExpandProperty Value
If ($Name)
{
$extraparams["filter"] = @{
'Name'=$Name
'Config.DistributedVirtualSwitch'="$($vDSMoRef)"
}
}
Else
{
$extraparams["filter"] = @{
'Config.DistributedVirtualSwitch'="$($vDSMoRef)"
}
}
}
If ($Name)
{
$extraparams["filter"] = @{'Name'=$Name}
}
}
Process
{
get-view -ViewType "DistributedVirtualPortgroup" -verbose:$false @extraparams |
Select-Object @{
Name='Name'
Expression={$_.Name}
},
@{
Name='Description'
Expression={$_.Config.Description}
},
@{
Name='PortBinding'
Expression={$_.Config.Type}
},
@{
Name='VLANID'
Expression={(($_.Config.DefaultPortConfig.Vlan.VlanId|%{
if ($_ -match "\d+") {$_}
elseIf ($_.Start -eq $_.End) {$_.Start}
Else {"{0}-{1}" -f $_.Start,$_.End}}) -join ",")}
},
@{
Name='NumbOfVMs'
Expression={$_.Vm.count}
},@{
Name='NumofPorts'
Expression={$_.PortKeys.count}
},
@{
Name='AlarmActions'
Expression={$_.AlarmActionsEnabled}
},
@{
Name='DistributedSwitch'
Expression={ Get-View $_.Config.DistributedVirtualSwitch `
-Property Name -verbose:$false |
Select-Object -ExpandProperty Name}
},
@{
Name='MoRef'
Expression={ $_.MoRef}
}
}
}
If you search for a portgroup like “VMOTION” the function also finds “VMOTION-DVUplinks-48″. With the patch the function only finds the exact name.
original code
Function Get-DistributedSwitchPortGroup
{
<#
.SYNOPSIS
Get Distributed Virtual Port Groups (DVPG) by name or vDS.
.DESCRIPTION
Get Distributed Virtual Port Groups (DVPG) by name or vDS.
.PARAMETER Name
Name of the DVPG to retrieve supports wildcards.
.PARAMETER DistributedSwitch
Name of vDS to retrive the DVPG for.
.EXAMPLE
Get-DistributedSwitchPortGroup -Name PG02
.EXAMPLE
Get-DistributedSwitchPortGroup -DistributedSwitch vDS01
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$false
, ValueFromPipelineByPropertyName=$true
, ValueFromPipeline=$true)]
[String]
$NAME
, [Parameter(Mandatory=$false
, ValueFromPipelineByPropertyName=$true)]
[String]
$DistributedSwitch
)
Begin
{
$extraparams=@{}
$extraparams["Property"] = @(
'Name'
, 'Config.Description'
, 'Config.Type'
, 'Config.DefaultPortConfig'
, 'Config.DistributedVirtualSwitch'
, 'VM'
, 'PortKeys'
, 'AlarmActionsEnabled'
)
IF ($DistributedSwitch)
{
$vDSMoRef = Get-view -Property Name `
-ViewType "VmwareDistributedVirtualSwitch" `
-filter @{'Name'=$DistributedSwitch}`
-verbose:$false |
Select-Object -ExpandProperty MoRef|Select-Object -ExpandProperty Value
If ($Name)
{
$extraparams["filter"] = @{
'Name'=$Name
'Config.DistributedVirtualSwitch'="$($vDSMoRef)"
}
}
Else
{
$extraparams["filter"] = @{
'Config.DistributedVirtualSwitch'="$($vDSMoRef)"
}
}
}
If ($Name)
{
$extraparams["filter"] = @{'Name'=$Name}
}
}
Process
{
get-view -ViewType "DistributedVirtualPortgroup" -verbose:$false @extraparams |
Select-Object @{
Name='Name'
Expression={$_.Name}
},
@{
Name='Description'
Expression={$_.Config.Description}
},
@{
Name='PortBinding'
Expression={$_.Config.Type}
},
@{
Name='VLANID'
Expression={(($_.Config.DefaultPortConfig.Vlan.VlanId|%{
if ($_ -match "\d+") {$_}
elseIf ($_.Start -eq $_.End) {$_.Start}
Else {"{0}-{1}" -f $_.Start,$_.End}}) -join ",")}
},
@{
Name='NumbOfVMs'
Expression={$_.Vm.count}
},@{
Name='NumofPorts'
Expression={$_.PortKeys.count}
},
@{
Name='AlarmActions'
Expression={$_.AlarmActionsEnabled}
},
@{
Name='DistributedSwitch'
Expression={ Get-View $_.Config.DistributedVirtualSwitch `
-Property Name -verbose:$false |
Select-Object -ExpandProperty Name}
},
@{
Name='MoRef'
Expression={ $_.MoRef}
}
}
}
modified code
Function Get-DistributedSwitchPortGroup
{
<#
.SYNOPSIS
Get Distributed Virtual Port Groups (DVPG) by name or vDS.
.DESCRIPTION
Get Distributed Virtual Port Groups (DVPG) by name or vDS.
.PARAMETER Name
Name of the DVPG to retrieve supports wildcards.
.PARAMETER DistributedSwitch
Name of vDS to retrive the DVPG for.
.EXAMPLE
Get-DistributedSwitchPortGroup -Name PG02
.EXAMPLE
Get-DistributedSwitchPortGroup -DistributedSwitch vDS01
#>
[CmdletBinding()]
param(
[Parameter(Mandatory=$false
, ValueFromPipelineByPropertyName=$true
, ValueFromPipeline=$true)]
[String]
$NAME
, [Parameter(Mandatory=$false
, ValueFromPipelineByPropertyName=$true)]
[String]
$DistributedSwitch
)
Begin
{
$extraparams=@{}
$extraparams["Property"] = @(
'Name'
, 'Config.Description'
, 'Config.Type'
, 'Config.DefaultPortConfig'
, 'Config.DistributedVirtualSwitch'
, 'VM'
, 'PortKeys'
, 'AlarmActionsEnabled'
)
IF ($DistributedSwitch)
{
$vDSMoRef = Get-view -Property Name `
-ViewType "VmwareDistributedVirtualSwitch" `
-filter @{'Name'=$DistributedSwitch}`
-verbose:$false |
Select-Object -ExpandProperty MoRef|Select-Object -ExpandProperty Value
If ($Name)
{
$extraparams["filter"] = @{
'Name'="^$($Name)$"
'Config.DistributedVirtualSwitch'="$($vDSMoRef)"
}
}
Else
{
$extraparams["filter"] = @{
'Config.DistributedVirtualSwitch'="$($vDSMoRef)"
}
}
}
If ($Name)
{
$extraparams["filter"] = @{'Name'="^$($Name)$"}
}
}
Process
{
get-view -ViewType "DistributedVirtualPortgroup" -verbose:$false @extraparams |
Select-Object @{
Name='Name'
Expression={$_.Name}
},
@{
Name='Description'
Expression={$_.Config.Description}
},
@{
Name='PortBinding'
Expression={$_.Config.Type}
},
@{
Name='VLANID'
Expression={(($_.Config.DefaultPortConfig.Vlan.VlanId|%{
if ($_ -match "\d+") {$_}
elseIf ($_.Start -eq $_.End) {$_.Start}
Else {"{0}-{1}" -f $_.Start,$_.End}}) -join ",")}
},
@{
Name='NumbOfVMs'
Expression={$_.Vm.count}
},@{
Name='NumofPorts'
Expression={$_.PortKeys.count}
},
@{
Name='AlarmActions'
Expression={$_.AlarmActionsEnabled}
},
@{
Name='DistributedSwitch'
Expression={ Get-View $_.Config.DistributedVirtualSwitch `
-Property Name -verbose:$false |
Select-Object -ExpandProperty Name}
},
@{
Name='MoRef'
Expression={ $_.MoRef}
}
}
}
]]>There are good reasons why network installations might not work for you: ESD licensing, other PXE services in subnet, DHCP/PXE not implemented… etc.
If you want to create a full documented unattended installation of XenServer without network boot here is the way to go: instead of booting the installer from network you are booting from CD/ISO file. The installation binaries and answerfile (defining your installation parameters) is located on a network location (http,ftp,nfs).
I use a Debian Linux with Apache webserver as installation repository. In the first step copy the whole content of the installation media to the repository.
# mount the Xenserver install media
mount -o loop XenServer-6.0.201-install-cd.iso /mnt/tmp
# create the repository within apache default website
mkdir /var/www/xs602
cp -R /mnt/tmp/ /var/www/xs602/
# do not forget to set permissions
chown -R www-data:www-data /var/www/xs602/
Next you need an answer file for the XenServer installer. Here is mine:
<?xml version=”1.0″?>
<installation mode=”fresh” srtype=”lvm”>
<bootloader>extlinux</bootloader>
<primary-disk gueststorage=”yes”>sda</primary-disk>
<keymap>de</keymap>
<hostname>xenserver01</hostname>
<root-password>putyourcleartextpasswordhere</root-password>
<source type=”url”>http://192.168.1.4/xs602/</source>
<admin-interface name=”eth0″ proto=”static”>
<ip>192.168.1.14</ip>
<subnet-mask>255.255.255.0</subnet-mask>
<gateway>192.168.1.1</gateway>
</admin-interface>
<name-server>192.168.1.4</name-server>
<timezone>Europe/Berlin</timezone>
<time-config-method>ntp</time-config-method>
<ntp-server>10.1.1.10</ntp-server>
</installation>
As last step fire up the target system and boot CD/ISO File. At the bootscreen enter “menu.c32″ and use [TAB] to edit the installation string. Add “answerfile=http://yourwebserver/answerfile.xml install” after “console=ttyp0″.
That’s all, get yourself a cup of coffee and relax while your XenServer gets installed.
Even through this is not(!) the 100% automated installation (you need to boot from a CD/ISO file and type in a small string) it helps you to have a full documented and reproducible installation. Best thing is you do not need any kind of infrastructure – just a webserver for the installation repository.
]]>esxcli software vib list |grep -i emc
esxcli software vib remove -n EMCNasPlugin
vim-cmd hostsvc/maintenance_mode_enter
reboot
Unfortunately the Forwarder Mangement WebGUI only displays the OS platform of the client but not the Software version.
With the Nagios plugin we want ensure that all your clients (Indexers, Search Heads, Forwarders) are running at least the defined Splunk version. Otherwise an error will be generated.
You need to have Forwarder Management implemented for this check. On the client you just need to point it to one Forwarder Management server, which can be any Splunk server in your environment. You can set the Forwarder to the Forwarder Management server with the command
$SPLUNK_HOME$/bin/splunk set deploy-poll bd20.bwlab.loc:8089
and check using
$SPLUNK_HOME%/bin/splunk list deploy-poll
The Nagios plugin queries Forwarder Management for the client list and compares every client against a minimum build level you can define. The plugin is a Powershell script communicating with the REST API of Splunk. For that reason the script has to be executed from a Windows device. That does not meanthe Splunk instance running the Forwarder Management role has to be installed on the Windows machine. If yourun Splunk on Linux or Mac you just need a Windows machine in your environment which executes the script against the non-Windows Splunk instance.
You can download the plugin from here. It uses some functions from the Splunk Powershell resource Kit which is also included in the download.
In this example the Forwarder Management server runs on the same machine like the Nagios client nsclient++. If you need to do an indirect query, because your Splunk server is running on a non-Windows machine simply adjust the IP in the Nagios service definition.Download and extract the files to C:\Program Files\NSClient++\scripts\splunk
[/settings/external scripts/scripts]
splunkfwmanagementversion = cmd /c echo scripts\\splunk\\check-deploymentclientsversion.ps1 -servername $ARG1$ -username $ARG2$ -password $ARG3$ -minbuild $ARG4$; exit($lastexitcode) | powershell.exe -command –
use generic-service ; Name of service template to use
host_name bd20.bwlab.loc
service_description Splunk FW Management Clients Version
check_command nt_nrpe_splunkfwmanagementversion!localhost!admin!mypassword!220630
}
After reloading the Nagios config you should verify the status of the check. It should look like this if everything is running smoothly.
Parameters
You can also run the PowerShell script manually for testing. The script accepts multiple parameters
-servername
Servername or IP address of the Deployment Server/Forwarder Management
-port
Port of splunkd – default 8089
-protocol
Protocol to use to communicate with splunkd – default: https
-timeout
Connection timeout to splunkd in milliseconds – default 5000
-username
Username to use to login to splunkd
-password
Password to use with splunkd
-minbuild
Build version of client to check. has to be passed as an integer value. if client runs on a lower build version a critical message will be is generated
example build numbers
version 4.3.3 = build 128297
version 6.0.2 = build 196940
version 6.1.1 = build 207789
version 6.1.3 = build 220630
version 6.1.4 = build 233537
version 6.1.5 = build 239630
version 6.2.0 = build 237341
version 6.2.1 = build 245427
]]>If you use Forwarder Management (also known as Deployment Server) to configure your infrastructure, you really want to make sure your Clients/Forwarders are up-and-running. In the Splunk Webpage you have a page for this within Settings->Forwarder Management:
To ensure that a client is pointed to the Deploymentserver check the configuration in $SPLUNK_HOME$/etc/system/local/deploymentclient.conf or run the “splunk show deploy-poll” command. To set the Forwarder Management Server use “splunk set deploy-poll SERVER:8089″.
By default a client will call back Forwarder Management Server every 60 seconds. If communication fails the output looks like this:
The phone home interval can be configured in $SPLUNK_HOME$/etc/system/local/deploymentclient.conf using the phoneHomeinvervalinSecs Parameter.
The Nagios plugin asks the Forwarder Management if every client has phoned back correctly. The plugin is a Powershell script communicating with the REST API of Splunk. For that reason the script has to be executed from a Windows device. That does not mean the Splunk instance running the Forwarder Management role has to be installed on the Windows machine. If you run Splunk on Linux or Mac you just need a Windows machine in your environment which executes the script against the non-Windows Splunk instance.
You can download the plugin from here. It uses some functions from the Splunk Powershell resource Kit which is also included in the download.
[/settings/external scripts/scripts]
splunkfwmanagement = cmd /c echo scripts\\splunk\\check-deploymentclients.ps1 -servername $ARG1$ -username $ARG1$ -password $ARG2$ -warn $ARG3$ -critical $ARG4$; exit($lastexitcode) | powershell.exe -command –
# ‘nt_nrpe_splunkfwmanagement’ command definition
define command{
command_name nt_nrpe_splunkfwmanagement
command_line /usr/lib/nagios/plugins/check_nrpe -t 30 -H $HOSTADDRESS$ -p 5666 -c splunkfwmanagement -a $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$
}
define service{
use generic-service ; Name of service template to use
host_name bd20.bwlab.loc
service_description Splunk FW Management Client Connectivity
check_command nt_nrpe_splunkfwmanagement!localhost!admin!mypassword!5!30
}
After reloading the Nagios config you should verify the status of the check. It should look like this if everything is running smoothly.
In case of an error it will look like this:
You can also run the PowerShell script manually for testing. The script accepts multiple parameters:
-servername
Servername or IP address of the Deployment Server/Forwarder Management
-port
Port of splunkd – default 8089
-protocol
Protocol to use to communicate with splunkd – default: https
-timeout
Connection timeout to splunkd in milliseconds - default 5000
-username
Username to use to login to splunkd
-password
Password to use with splunkd
-warn
time in seconds (default 5) which a client is allowed to overdue before a warning is generated, depends on configured phoneHomeIntervalInSecs (default 60) in client settings
-critical
time in seconds (default 300) which a client is allowed to overdue before a critical is generated, depends on configured phoneHomeIntervalInSecs (default 60) in client settings
]]>For wrapping/creating .mdx files on a Mac you’ll need to install JDK1.7, XCode, Xcode Command Line Tools and Citrix MDX Toolkit.
This is how MDX Toolkit looks like when running it in a VM:
compared to a non-virtual mac:
It looks quite unusable. But hey, MDX Toolkit is just a GUI application triggering a command line. For IOS it is triggering the CGAppCLPrepTool application with some parameters. Let’s check if the command line works:
./CGAppCLPrepTool Wrap -Cert “iPhone Distribution: Joe Public (ABCDEF1234)” -Profile “citrix_distribution.mobileprovision” -in “myapplication.ipa” -out “myapplication.mdx” -appdesc “doing this stuff from commandline”
As a result you’ll get the expected .mdx file – Upload this file to Appcontroller and it will work.
For wrapping applications for Android you’ll need to have JRE, Android SDK and MDX Toolkit installed. Let’s check what happens if you copy the whole /Applications/Citrix/MDX Toolkit Folder to a Windows machine.
First we need to create a Keystore:
“c:\Program Files\Java\jdk1.7.0_67\bin\keytool.exe” -genkey -dname “cn=Android, o=Android, c=US” -keystore C:\temp\demo.keystore -storepass android -alias wrapkey -keypass android -keysize 1024 -sigalg SHA1withRSA -keyalg RSA
Next step is to wrap the application – I’m wrapping the Citrix Worx Mail app in this example:
“c:\Program Files\Java\jdk1.7.0_67\bin\java.exe” -jar C:\temp\MDXToolkit\ManagedAppUtility.jar wrap -in c:\temp\apps\CitrixEmail9.0-release.apk -out c:\temp\apps\CitrixEmail9.0-release.mdx -keystore C:\temp\my.keystore -storepass android -keyalias wrapkey -keypass android
No errors – looks fine, worx|ks fine.. and means you can wrap Android applications on Windows. You just need somebody who will install the MDX Toolkit one time and provide you the extracted installation files.
]]>