Archive for Performance

Dec
07

VistaDB 4.3 Performance Optimization

Posted by: | Comments (2)
Download VistaDB 4.3

Download VistaDB 4.3

We are happy to announce the biggest update to the VistaDB engine since Gibraltar Software took over the product last year. It’s actually our fifth update, though our earlier releases were more limited in scope consisting of a new streamlined licensing system and a number of bug fixes—often providing closer compatibility with SQL Server scripts.

The main focus of VistaDB 4.3 is query performance, particularly with multiple JOINs. We’re pleased to report 2.5x improvements in many cases and discuss below what’s happening inside the VistaDB engine to achieve these results. Let’s start with an overview of how VistaDB produces query results.

How Does My Query Work?

VistaDB queries are executed in three general phases:

  • Parse: The SQL query text is parsed into a tree of objects representing each language element.
  • Prepare: The statement/expression tree is recursively processed to identify table and column references and to determine the data types of results.
  • Execute: The statement/expression tree is recursively processed to execute statements and provide results in order (returning to the calling application as each row is ready).

If there were no optimizations at all, the engine would walk every row of the parent table. And for each of those rows, then walk every row of the first joined table looking for matches to the ON clause. For each of those rows it would then repeat the process for every row of every additional table referenced in the query. Obviously, this would be ridiculously slow when joining multiple tables of any real size, or even when querying a single table with a large number of rows when you don’t actually want most of them.

To be more efficient, VistaDB performs an additional optimization step at the start of the Execute phase. This step recursively walks the WHERE clause and ON clauses and converts the comparison expressions (and special functions such as BETWEEN) into a more efficient representation as constrains in which each constraint specifies a range of values for a particular column based on constants, parameter values, or the value of a column from an earlier table in the parse tree.

In optimizing the parse tree, VistaDB simplifies the execution plan into a series of constrained tables you could imagine as being evaluated left-to-right. Constraints that can’t be resolved are declared non-optimizable and must be handled by testing the WHERE clause. Similarly, target columns for which there isn’t an available index are also non-optimizable and must be tested row-by-row. These optimized conditions are then processed for logical ANDs and ORs to calculate an overall optimized filter for each table.

Building On What Already Works Well

VistaDB has always done well with queries in which a range of rows can be retrieved on a single-column index with an identifiable starting and ending value based on the current rows of tables “to the left” of it. For example, if the parent (FROM) table is restricted by a single comparison in the WHERE clause such as: WHERE ParentTable.ColA = @ChosenValue; (with an index on ColA), then the engine doesn’t need to walk every row of ParentTable, it can start with the first row in the index with a value of @ChosenValue for ColA, and walk each row in the index until it passes the last row with a value of @ChosenValue for ColA. If another table is then joined in it doesn’t need to consider any combinations outside of that range on ParentTable; they’re already certain to be excluded by that condition in the WHERE clause.

Improvements in VistaDB 4.3

We use VistaDB extensively in our Gibraltar application monitoring system and noticed that VistaDB performance left something to be desired for some of our more complex queries. For example, it is common to use a placeholder ID (perhaps a UNIQUEIDENTIFIER) as a foreign key into a small lookup table which can contain additional information fields universal to that value. In Gibraltar we have tables such as Application_Type and Boot_Mode which provide caption and description labels for display purposes. They can be joined directly into a query about one or more sessions, like so:

SELECT * FROM Session_Details SD
	JOIN Processor_Architecture OSA
		ON OSA.PK_Architecture_Id = SD.FK_OS_Architecture_Id
	JOIN Boot_Mode BM ON BM.PK_Boot_Mode_Id = SD.FK_OS_Boot_Mode_Id
	JOIN Processor_Architecture RA
		ON RA.PK_Architecture_Id = SD.FK_Runtime_Architecture_Id

The joined tables are tiny, only 5 or so rows each, so (in theory) this should be very efficient. Each joined table can have its unique matching row directly looked up based on the corresponding column value in the parent table. Perfect, right? But this query was taking several seconds. The base query (SELECT * FROM Session_Details) took less than half a second, and that’s querying the entire table! What we found in VistaDB 4.2 was that as each JOIN was added to this query, the overall query time nearly doubled! Something was clearly less efficient than it should be.


As we analyzed the engine internals, we found a lot of opportunities to improve performance which we’ll be implementing over the coming year. As a first step, we decided to focus in VistaDB 4.3 on reducing the overhead for multiple joins and optimize for the most common cases.

We expect that the majority of joins will be on a single equality between a single column from each table with a foreign key relationship between them. Since this should by definition identify a single value (and often a single row), it should be the most efficient filter to narrow down the joined table based on those “to the left”. So the optimization logic will now catch these top-priority conditions early on and bypass the rest of the expensive reduction pass.

And in queries like our example, we integrated column value caching to eliminate the need to search for the same rows over and over again. When a table is joined on a UNIQUE single column index, the table can cache the row in a Dictionary keyed by the column value from the other table, and each time it comes back to that value, it grabs the row from cache instead of searching the index and reading it from disk again.

We also coded our caching to ensure that it doesn’t consume excessive memory when processing large tables. The cache only holds hard references to the most recently accessed rows. By using weak references to less recently used rows they stick around when memory is plentiful but can be garbage-collected if necessary. For more info on on weak references, check out Kendall’s Code Project article and sample code on creating a single instance string store.

As shown in the graph above, VistaDB 4.3 is several times faster for many common queries. More importantly, in queries such as above, performance degrades linearly as more tables are joined, rather than exponentially as before.

Stay Tuned for More to Come

The query optimizations we’ve introduced in VistaDB 4.3 are just a start. Subsequent releases will have additional query optimizations as well as other performance improvements such as support for bulk insert and enhanced multi-user scalability. We also will be adding new features such as enhanced support for Entity Framework, enhanced compatibility with SQL Server and improvements to our development tools (Data Builder, Data Migration Wizard, etc).

We’ll be writing additional blog posts about our adventures taking VistaDB to the next level, so check back often or leave a question/comment below or in our support forums –we’d love to hear from you!

Had a great time presenting at DotNetDevNet in Bristol last night – thanks to everyone for coming out from all over southern England.

You can see the slides below and Download the Parallel Processing Demo.

Categories : Speaking
Comments (0)
Jul
04

.NET String Theory

Posted by: | Comments (0)

kick it on DotNetKicks.com

I love talking with real customers; no matter how many times you do it you discover new uses for your product and new edges you never knew where there before.  We’d designed Gibraltar with certain sizing assumptions based on predecessor systems we’ve created for clients and feedback from the beta period.  One thing you can always count on is that no matter how much you optimize a system, users will push it until it breaks.

We were talking with a customer recently that was using Gibraltar to instrument a system that produced 150MB of data in a two minute test run.  Now, we use a very aggressive amount of compression so it takes a lot to generate 150MB of data - millions of data points.  What was interesting is that it caused a great deal of memory to be used up both in the agent and in the Analyst when the session was opened.  Digging through the details, we worked out that nearly all of the memory was being used for strings, most of which were in memory many times because they came from different places.

Our data compression format already pushes string reuse when reading data, but reusing strings while collecting is more challenging (because they’re coming from outside of Gibraltar).  Additionally, strings happen in places you don’t expect – like every time you display a value, you’re inevitably converting it into a string.

One approach to help with this is Interning strings which gives you a way to take a live string and swap it for the one common object.  This leverages the fact that strings are immutable in .NET so if two strings are the same they’re interchangeable.  Unfortunately, .NET takes interning to an extreme – once you intern a string, it stays in memory until the AppDomain is unloaded.  For our purposes that’s a nonstarter, it would be a huge memory leak and kill the Analyst (which in extreme cases needs to load 1GB of data, then drop it once the session is closed).

Cutting Strings down to Size

Instead, we’ve spent a few day writing and testing a single instance string store for both the Agent and Analyst.  Initial results are very encouraging – we were able to reduce memory utilization in some extreme cases (like using a very large Live Viewer buffer on a running application) down by 90% with only a 5% processor load penalty.  Better yet, since the Agent is designed to be asynchronous we’re doing all of this optimization work behind the curtain where it doesn’t affect application response time.

Many users won’t notice a difference – if you’re not using the Live Viewer or don’t log much there isn’t much to optimize (and the overhead of the single instance store is zero).  If you have a lot of RAM then it won’t really matter because the OS will just throw things at your app anyway.   But when you want to go crazy and log a great deal of data and view it in real time it’s great to have that safety buffer there.

As a typical example, you can do a before and after comparison of our dynamic binding sample app:  We reduced the working set of the application by 128MB under heavy logging.  You never know when that extra memory will come in handy, and I love that we’re doing things that ensure you never need to worry about what our Agent is up to when running your app.  Here’s a chart of the application running on a system with a great deal of memory (so there is no memory pressure) logging over 2.8 Million messages:

Memory Usage for Sample Application (2.8 Million Messages)

Memory Usage for Sample Application (2.8 Million Messages)

The only difference between the two applications is the version of our agent (the upper lines are Agent build 414, the lower is the latest internal build 430).  Keep in mind the improvements are not percentage but absolute.  Since the sample app does nothing but log it has a disproportionate effect.

The other thing you have to remember is that you really only have about 1.5GB of memory to work with as a 32-bit process, so doing what you can to both reduce fragmentation of RAM and keep your footprint down for the really extreme cases is great.  This single instance store trick is particularly useful because the more strings there are, the higher the probability the next one is already in the list.  This means the amount of memory reduction increases with the amount of memory used.

Give it a spin on your project

You can try this all for yourself – check out our article on CodeProject where you can download the complete source code so you can get these benefits for your own project!
kick it on DotNetKicks.com

Categories : .NET, Development
Comments (0)

kick it on DotNetKicks.com
As we were gearing up for the commercial release of Gibraltar one of the last things on our plate was a final pass of performance and memory optimization.   You’ve probably read a lot about the dangers of preemptive optimization, and while I’m generally a fan of that philosophy as a team we just can’t shake the desire to write the fastest code we can as we go along.  Usually that means right out of the gate our early betas are pretty responsive and we don’t have many hot spots to look into.

man-clipboardWhen it does come time to look into problem areas, it never ceases to amaze me that every developer has an opinion on what’s causing the performance issue that they’re completely confident in…  And invariably wrong about.  This is one reason I’m a big fan of having folks on your team that are experienced at code profiling, typically using one or more profiling tools.  These tools (and the experienced developer that knows how to use them) create a set of facts about what is really happening that’s far more accurate than developer instinct at improving the real world performance.

One reason this is so is because of a natural tendency to assume that less code will run faster than more code.  This is  instinctive because:

  • It’s natural to assume that each line of code represents work, and therefore more lines is more work.
  • Opaque lines (such as keywords, runtime calls, or calls to third party libraries) tend to be viewed as all having similar performance because they are a black box.

Interestingly, making a program faster almost always involves giving it more lines of code, sometimes a lot more lines.  This is more apparent in environments like .NET than in thinner environments (like Visual C++/MFC) because the underlying code libraries are so large.  When you can display a sortable, pagable grid of data from a database in 10 lines of code it can feel very wrong to write two hundred lines of code to make it fast.

Code Gravity

Another reason that instinctive performance optimization is generally not effective is that the slowest operations are also the most abstracted away from the developer.  Even on a modern computer there are some operations that can be measured by a stopwatch:

  • Network communication: The time it takes to establish a network connection, move one byte of data to another computer, and get one byte in response is massive.  While there’s a lot of layers of abstractions to make this look fast, those facades can’t cheat nature all of the time.
  • Graphical User Interfaces: Drawing a UI page on screen, whether it’s rendering a web page or displaying a window, takes a lot of time.

string_phoneUnfortunately, these areas are likely to stay slow because there are natural laws at work that thwart performance improvements.  That means while the days of worrying about which index in an array you’re iterating over first are behind us, you still need to worry about every call that might go across a network, even if it’s just one line (and it often is).

Connecting to a remote server even when things go well is much slower than doing a very inefficient sort of 1000 objects – at least 100 times so.  It’s even worse than that: Doing a sort is a very consistent operation; on a given computer it’ll take basically the same amount of time every time.  Connecting to a remote server is a highly variable operation:

  • Is there a network interface available?
  • Does it have its IP address?
  • Do you have the server in your DNS cache  or do we have to look it up in DNS? (which takes a separate connection, query, and result before we can even start what you wanted)
  • Is the other server still available?
  • and on and on…

It’s all Relative

Finally, when you look at a section of code you’ll tend to think in terms of the percentage you could improve that area, but you can’t mentally put that in the context of the entire time of the application.  Perhaps that filter routine could be made 30% faster by having it check for some common boundary cases first, but if the filter is 0.1% of the total runtime of your application, that’s wasted time – and possibly very costly because you’ll need to test each of those optimizations.

With Gibraltar Analyst, we had a few counterintuitive optimizations that made a big difference:

  • With smaller data views the time it takes to rebind the grid views we’re using is relatively large (even though it’s just one line of code) so it was worth comparing updated datasets with the displayed dataset to make sure there were changes before rebinding.
  • We are calculating some joins, tops, and filters in memory manually instead of running more optimized queries because it lets us do one database query and use the results in many places, which is particularly faster on new multi-core computers where we can process the variations in parallel.
  • We were going to some lengths to avoid reading file data unnecessarily.  It turns out that even on a basic laptop disk performance for small files is not bad.  Slower than most things, but much faster than it used to be.  Users were refreshing the entire display to force it to read data when we guessed wrong which was a far more expensive operation.  We were better off reading the files each time.

Interestingly, as a team we’d marked several areas ahead of time we were sure were going to need optimization later.  When we profiled the application in use none of the marked areas was above the noise – in fact in total they accounted for less than 5% of the processor utilization of the application.   Even with an 80% performance improvement the user would only see a 4% change in the application, well under the threshold of perception.

And make it look fast

The final odd thing we ran into is something akin to what Microsoft discovered with Vista:  People have poor mental stopwatches.  After a range of improvements that we know objectively improved our startup time significantly as well as other key operations we got feedback from our user community that they thought the app was actually slower.

soap_box_derby_racerWhen we dug into this we discovered that users weren’t interpreting the operations as ending when the wait cursor went away and they could interact with the application but instead when all visible changes were complete.

We’d improved performance by pushing more work into asynchronous events, but when these completed they were triggering minor user interface recalculations that would result in very small shifts in small parts of the screen (as it recalculated the correct size of the data displays).  Users were used to this small visual twitch and treated the application as being busy until they saw all of that stop, sure that they needed to wait.

In this case, we did two things:  We added checks at the end of these routines to make sure the data changed before allowing the size calculation to eliminate the visual twitches (at the expense of some processor time, but remember that comparing data is much faster than moving pixels around on screen) and we moved many things back into the foreground to make sure they were completed as quickly as possible.

So in the end, for systems that interact with users it isn’t about achieving paper performance but doing what feels the fastest to the end users.  Often they’re the same thing, but the only way to know for sure is with user experience testing.
kick it on DotNetKicks.com

Categories : Development
Comments (0)