Archive for Development

One last update to Gibraltar 2

We’ve done several functional releases this year and we’re going to do one more minor one before we close the books on 2.x.  This last update is going to include some memory usage optimizations, reliability tweeks, and the first version of the Gibraltar Add In API.

Memory and Reliability Optimization

There are a few scenarios we use more memory than we should in the Agent.  These scenarios all revolve around sending larger sessions via email or writing them to a package file (Hub users are not affected because it exchanges data completely differently).  We’re changing how these operations are performed to emphasize using constant memory (and less memory) instead of being optimized for the fastest performance.  For smaller sessions the performance optimizations don’t matter, and for larger sessions it’s preferable to ensure we can not interfere with the running application vs. getting it done in the smallest time possible.  This trade-off is done by not explicitly buffering data in RAM but instead using self-deleting temp files for interim storage.  If your computer has enough RAM the OS will cache these anyway and if you don’t you’ll be thankful we aren’t using what little you have.

AddIn API

We’ve been working on this as part of the 3.0 plan but think there’s enough there to release it early.  This is the first time you’ll be able to extend the Gibraltar Analyst itself using your own code.  Our primary goal was to provide a safety valve – a way you can handle analysis and integration scenarios we’re not handling yet.  If you want to integrate with your own customer service or defect tracking system, create your own visualizations or whatever this will provide a way to get it done.

You’ll be able to extend Analyst in two ways:

  1. Analyze every session: Whenever a session is added to the user repository Analyst checks through it to record error statistics.  You can extend this pipeline to add on your own inspection, looking at the entire log or just the errors.  This is done automatically in the background.
  2. Custom session commands: You can add commands to the context menus wherever sessions are being managed to get specific sessions on request.  You can chose whether you want to provide your own user interface or be run in the background.   Register your command and it will automatically show up on session folders, in grids, and other places where sessions are being managed.

What can you do with this?  Well, a few ideas are:

  • Forward details about errors to an external customer service or defect tracking system: With an Analysis add in you can check each session for errors and create an email or do whatever you need to open a new ticket.  You’ll have access to everything in the session file.
  • Export data into a data warehouse: If you have your own external system for tracking overall metrics or whatever your analysis add in can write data to a file, upload it to a database or whatever you need.
  • Create your own visualization: Since you know exactly how you’re recording data (log messages and metrics) you might want to create your own special graphical treatment that leverages that knowledge to provide quick insight.

Share and Share Alike

A best practice is to ship an extension API with several great examples that show best practices.  Well, we didn’t want to hold it back long enough to get them done so we’re going to be looking to our community for great examples to share.  If you create an extension that others might be interested in, let us know – we’re going to set up a place to share extensions with the community.

We’re interested in hearing from the community on how you want to be able to extend Gibraltar – Analyst, Agent, and Hub – so we can incorporate those ideas into 3.0 and beyond.

And then on to 3.0

We have a queue of items we’re already working on that are too big to fit into a dot release of the product.  For example, we haven’t changed the data format of Gibraltar since the early betas.  There are some issues that we simply can’t fix without adjusting the format which will create a situation where older Analyst won’t be able to read the newer data.  We aren’t willing to do that on a dot release.

We also are going to break new ground in 3.0 on the CEIP front with an all-new application management view of the session information.  When we get closer to our first beta we’ll release more on that and other enhancements.

New features for all our friends

Every Gibraltar customer will get a free upgrade to Gibraltar 3.0 when it ships.  This is the power of our software maintenance in action – every license includes a year of it to make sure you get the most out of your investment.  We don’t have a specific date in mind for shipping 3.0 but expect it to be late Q2/early Q3.

Categories : Development
Comments (0)
Feb
19

Gibraltar 2.1.1 Released

Posted by: Kendall | Comments (1)

kick it on DotNetKicks.com

We’ve published Gibraltar 2.1.1, you can download it right away.  There are a number of great enhancements in this release.  We’ve already covered a few of them before:

Based on feedback on the beta release we’ve added some additional capabilities:

  • Send Session Now: If you subscribe to the Message Alert event you can send the current session immediately to the Hub or Email based on the current configuration by setting one property.  Check out the code sample to see how.
  • Easy email notifications: You can leverage the same email configuration the Agent is configured with to send messages within your application for any reason.
  • Anonymous Data Collection: Session data can be anonymized during collection so no personally-identifying information is sent to you.  Just set one option and you’re good to go.
  • Detailed .NET Memory Counters: You can now enable detailed memory performance counters that monitor the .NET CLR’s garbage collector and memory monitoring.  Very useful for monitoring for memory leaks in production applications.
  • PostSharp Enhancements: We’ve made argument tracking more sophisticated so you can do more without compromising the performance of your application.

We also fixed a number of defects (23) that mostly apply to edge cases, but no defect is minor when it affects you.  In particular, our CEIP identified a error on first time startup in several cultures that prompted users to restart Analyst.  Ouch.  Fortunately we were able to figure it out and fix it.  We addressed Thread Ids too.

Hub Subscriptions Live Too

On February 15 the Gibraltar Hub Service is fully live.  You can get a free 30 day trial and then if you like what you see you can subscribe for terms from 1 month to one year at a scale that works for you – from a single laptop up to your whole large team.   There’s no long term commitment, and you can even easily migrate from the Hub Service to your own private hub down the road if you want to.

This Release Made Possible By People Like You

We say it all the time, but this release in particular was driven entirely by end-user requests.  We’re working on the next major release of Gibraltar but we stepped back and wanted to address requests from our customers and a few prospects as well.  When you read the list of everything we’ve done, other than a few defects we found internally and through the CEIP this is based on what our customers felt was most important.  Are we missing something you need?  Let us know:  we’ve proven we listen again and again and again.

You can read a thorough list of the new features, defect fixes, and changes at What’s New in Gibraltar 2.1.1.
kick it on DotNetKicks.com

Categories : .NET, Development, ISV
Comments (1)

We had a customer quiz us about why one of our thread names was showing up on some of their log messages.  We looked into the problem and were a bit baffled.  We name all of the threads we create inside the Agent to ensure we can separate what they do from any client application.  The name in question is used by a thread that the Gibraltar Agent creates and then destroys relatively early in the process.  This thread isn’t taken from the threadpool or put back into one, we confirmed it gets created and released so there just seemed no way that they could be processing on our thread.

We checked the data up and down and were confident that it wasn’t a data corruption problem – the only assumption made by the code was that Managed Thread Ids are unique.  This seemed pretty reasonable: the documentation for the ManagedThreadId property reads:

Thread.ManagedThreadId: Gets a unique identifier for the current managed thread.

But, we kept digging and found another scenario on a long running ASP.NET application where a similar event occurred – a thread that was created and destroyed relatively early in the application was clearly now in the thread pool and handling events.  Researching more, we found this gem in the documentation.  Not on the MSDN documentation for ManagedThreadId but rather for Thread.GetHashCode:

The hash code is not guaranteed to be unique. Use the ManagedThreadId property if you need a unique identifier for a managed thread.

OK, still pointing us that ManagedThreadId is the right guy for our use.  But then there’s this note on the Thread Class itself:

GetHashCode provides identification for managed threads. For the lifetime of your thread, it will not collide with the value from any other thread, regardless of the application domain from which you obtain the value.

This started to cast some concern:  That little bit of weasel room in the second sentence is troubling: “For the lifetime of your thread”…  Was .NET reusing thread Id’s after a thread exits?  The wiggle room in the statement above made that sound possible, even though there’s no reason necessarily that the hash code and the thread Id are related.  My first read of this was that the variation was about the second part of the sentence – uniqueness across application domains (which we never assumed).

So we created a few brutal tests – creating and destroying threads then ramping up the thread pool’s activity.  Sure enough, the same Managed Thread Ids showed up in the thread pool.  These weren’t the same threads – the thread static variables we were using for tests had been reset – but they had the same Managed Thread Id.

Go Team

The fix for us is to not rely on Managed Thread Id for correlating events to threads.  Instead, we’re using an internal thread static variable to track the relationship and identify it with our own unique identifier.  Because we track the thread responsible for log messages and many other things we record we had to represent this in the smallest amount of data feasible, and remain backwards/forwards compatible with existing data.

We’ve updated the display to automatically generate unique display names to separate out threads with the same Id’s and had to do a range of other adjustments to ensure we treat the Managed Thread Id as nothing more unique than a display name.   That way you’ll be sure that if two events are ascribed to something called “Thread 14″, they really are the same thread.  All of  the changes for this are included in Gibraltar 2.1.1 which will ship within the next few days (this was the last issue we needed to resolve before shipping).

Incomplete is worse than Missing

The frustrating part is that if the documentation had never made any claim about the uniqueness of the thread Id we’d likely have gone through a set of proof and qualification testing.  Like many people, when there isn’t documentation on something we have to create experiments to tease out the true behavior, review source code, and then decide what risks we want to take.  This is one reason we are passionate about documentation, even at the expense of extra features.  We want to make sure that you never have a doubt about what something on our API does.  We also know that people don’t want to review documentation if they don’t have to – so we try hard to make the API understandable just from Intellisense.

Now, I don’t want to knock Microsoft too hard here – .NET is a massive framework even if you just look at the core .NET 2.0 API.  But, as we all rely more and more on ever increasing layers of abstraction over what’s really going on it’s more important than ever to be precise in the documentation – about what something is and what it isn’t.  Precise is more important than being comprehensive, because it will set the right expectation for people about what they can rely on and what they’ll have to verify for themselves.

Categories : .NET, Development, Logging
Comments (5)
Hippocrates - 460-377 BC

Hippocrates’ Primum non nocere, “First do no harm”

Several customers have requested a notification mechanism to be alerted when errors are detected in their programs.  Simply raising an event is straightforward, but our promise to our customers is that we’ll do the hard thinking that ensures Gibraltar is safe and robust in production systems.  Our mantra is: first, do no harm.

In this case, we asked ourselves questions like:

  • What if a customer’s error notification logic is slow?  How do we ensure that it doesn’t slow down the application as a whole?
  • What if the program starts screaming thousands of errors?  How do we ensure that we don’t swamp the error notification handler?
  • What if there are errors in the customer’s error notification handler?  What if it throws an exception?  What if it hangs?

This resulted in a design that ensures that the logging infrastructure (including Gibraltar itself AND customer logic that interfaces with it) will be robust and safe.

Our central Log object in Gibraltar Agent now has a MessageAlert event that is raised when warning, error, or critical messages are recorded.  This event has a number of safety features such as:

  • Asynchronous: The event is raised on a background thread that is not part of the logging path, ensuring that time spent handling the event will not slow down logging or affect other threads.
  • Batching: When a burst of messages are recorded that qualify they will typically be raised together to allow more efficient processing
  • Throttling: A minimum delay between events can be easily specified to ensure the event isn’t raised too frequently, particularly in error cascade scenarios.  Messages are batched up until the next time the event can be raised.
  • Hang Protection: If the event handler never returns the Agent will continue to process messages and not queue them, allowing them to be released from memory.
  • Loop Protection: Messages that are recorded by your event handler will not cause additional events to be raised.  This prevents notification loops where an event handler records an error during notification which subsequently causes the message alert notification to be raised again.
  • Low Overhead: We don’t spin up anything (the threading, queue, etc.) until someone subscribes to the event so if you don’t use this feature it doesn’t take up resources either.

The MessageAlert event is particularly useful for automatically triggering immediate data transmission in the case of an error and implementing your own error notification mechanism.  The full detail of each log message is available in the event.

Check out our recent post on charting enhancements for more examples of how we are incorporating customer feedback to ensure that Gibraltar provides a robust logging infrastructure allowing you to build rock solid .NET software.

Categories : .NET, Development, Logging
Comments (2)