Archive for VistaDB

Dec
07

VistaDB 4.3 Performance Optimization

Posted by: | Comments (2)
Download VistaDB 4.3

Download VistaDB 4.3

We are happy to announce the biggest update to the VistaDB engine since Gibraltar Software took over the product last year. It’s actually our fifth update, though our earlier releases were more limited in scope consisting of a new streamlined licensing system and a number of bug fixes—often providing closer compatibility with SQL Server scripts.

The main focus of VistaDB 4.3 is query performance, particularly with multiple JOINs. We’re pleased to report 2.5x improvements in many cases and discuss below what’s happening inside the VistaDB engine to achieve these results. Let’s start with an overview of how VistaDB produces query results.

How Does My Query Work?

VistaDB queries are executed in three general phases:

  • Parse: The SQL query text is parsed into a tree of objects representing each language element.
  • Prepare: The statement/expression tree is recursively processed to identify table and column references and to determine the data types of results.
  • Execute: The statement/expression tree is recursively processed to execute statements and provide results in order (returning to the calling application as each row is ready).

If there were no optimizations at all, the engine would walk every row of the parent table. And for each of those rows, then walk every row of the first joined table looking for matches to the ON clause. For each of those rows it would then repeat the process for every row of every additional table referenced in the query. Obviously, this would be ridiculously slow when joining multiple tables of any real size, or even when querying a single table with a large number of rows when you don’t actually want most of them.

To be more efficient, VistaDB performs an additional optimization step at the start of the Execute phase. This step recursively walks the WHERE clause and ON clauses and converts the comparison expressions (and special functions such as BETWEEN) into a more efficient representation as constrains in which each constraint specifies a range of values for a particular column based on constants, parameter values, or the value of a column from an earlier table in the parse tree.

In optimizing the parse tree, VistaDB simplifies the execution plan into a series of constrained tables you could imagine as being evaluated left-to-right. Constraints that can’t be resolved are declared non-optimizable and must be handled by testing the WHERE clause. Similarly, target columns for which there isn’t an available index are also non-optimizable and must be tested row-by-row. These optimized conditions are then processed for logical ANDs and ORs to calculate an overall optimized filter for each table.

Building On What Already Works Well

VistaDB has always done well with queries in which a range of rows can be retrieved on a single-column index with an identifiable starting and ending value based on the current rows of tables “to the left” of it. For example, if the parent (FROM) table is restricted by a single comparison in the WHERE clause such as: WHERE ParentTable.ColA = @ChosenValue; (with an index on ColA), then the engine doesn’t need to walk every row of ParentTable, it can start with the first row in the index with a value of @ChosenValue for ColA, and walk each row in the index until it passes the last row with a value of @ChosenValue for ColA. If another table is then joined in it doesn’t need to consider any combinations outside of that range on ParentTable; they’re already certain to be excluded by that condition in the WHERE clause.

Improvements in VistaDB 4.3

We use VistaDB extensively in our Gibraltar application monitoring system and noticed that VistaDB performance left something to be desired for some of our more complex queries. For example, it is common to use a placeholder ID (perhaps a UNIQUEIDENTIFIER) as a foreign key into a small lookup table which can contain additional information fields universal to that value. In Gibraltar we have tables such as Application_Type and Boot_Mode which provide caption and description labels for display purposes. They can be joined directly into a query about one or more sessions, like so:

SELECT * FROM Session_Details SD
	JOIN Processor_Architecture OSA
		ON OSA.PK_Architecture_Id = SD.FK_OS_Architecture_Id
	JOIN Boot_Mode BM ON BM.PK_Boot_Mode_Id = SD.FK_OS_Boot_Mode_Id
	JOIN Processor_Architecture RA
		ON RA.PK_Architecture_Id = SD.FK_Runtime_Architecture_Id

The joined tables are tiny, only 5 or so rows each, so (in theory) this should be very efficient. Each joined table can have its unique matching row directly looked up based on the corresponding column value in the parent table. Perfect, right? But this query was taking several seconds. The base query (SELECT * FROM Session_Details) took less than half a second, and that’s querying the entire table! What we found in VistaDB 4.2 was that as each JOIN was added to this query, the overall query time nearly doubled! Something was clearly less efficient than it should be.


As we analyzed the engine internals, we found a lot of opportunities to improve performance which we’ll be implementing over the coming year. As a first step, we decided to focus in VistaDB 4.3 on reducing the overhead for multiple joins and optimize for the most common cases.

We expect that the majority of joins will be on a single equality between a single column from each table with a foreign key relationship between them. Since this should by definition identify a single value (and often a single row), it should be the most efficient filter to narrow down the joined table based on those “to the left”. So the optimization logic will now catch these top-priority conditions early on and bypass the rest of the expensive reduction pass.

And in queries like our example, we integrated column value caching to eliminate the need to search for the same rows over and over again. When a table is joined on a UNIQUE single column index, the table can cache the row in a Dictionary keyed by the column value from the other table, and each time it comes back to that value, it grabs the row from cache instead of searching the index and reading it from disk again.

We also coded our caching to ensure that it doesn’t consume excessive memory when processing large tables. The cache only holds hard references to the most recently accessed rows. By using weak references to less recently used rows they stick around when memory is plentiful but can be garbage-collected if necessary. For more info on on weak references, check out Kendall’s Code Project article and sample code on creating a single instance string store.

As shown in the graph above, VistaDB 4.3 is several times faster for many common queries. More importantly, in queries such as above, performance degrades linearly as more tables are joined, rather than exponentially as before.

Stay Tuned for More to Come

The query optimizations we’ve introduced in VistaDB 4.3 are just a start. Subsequent releases will have additional query optimizations as well as other performance improvements such as support for bulk insert and enhanced multi-user scalability. We also will be adding new features such as enhanced support for Entity Framework, enhanced compatibility with SQL Server and improvements to our development tools (Data Builder, Data Migration Wizard, etc).

We’ll be writing additional blog posts about our adventures taking VistaDB to the next level, so check back often or leave a question/comment below or in our support forums –we’d love to hear from you!

Sep
15

Gibraltar Software Acquires VistaDB

Posted by: | Comments (26)

When we started Gibraltar Software we always intended that it would grow to encompass multiple products that each fit with our common mission of helping .NET developers build rock solid software.

We’re thrilled to announce that Gibraltar Software is the new home for VistaDB – an embedded SQL Server-compatible database engine.  As you may recall, we’ve been enthusiastic about this product since we found it while doing the initial development of Gibraltar.  Over time we only got more interested in its potential to help developers in all kinds of situations where you want a database but don’t want SQL Server.

We use it ourselves in every part of our product – Analyst, Hub Server, and Agent.  There’s even a small database in every package file.  While we know that often SQL Server is a great answer as you scale up there are a lot of places where a small, portable database engine that’s upward compatible with SQL Server is just what the doctor ordered.

We’d been in discussions for some time with the owners of VistaDB about ways we could bring the two products together and finally found a way that made good sense for both parties.  Accordingly, we’re taking over full development, support, and sales for VistaDB as of October 1st.

What’s this mean for Gibraltar?

There are several great things that will come out of this.  First off, we will be able to use our real-world experience in Gibraltar to help optimize the performance of VistaDB.  There’s nothing like having to live and die by your own software to make you pay attention.  This will benefit all of the users of Gibraltar by improving its scalability and performance.

Secondly, while these products are both for the .NET market they’re far enough apart to have different sales cycles.  This is important because it means we can afford to put more revenue into product development instead of having to hold it back for the lean times that inevitably arrive in any single market.

Third, the added exposure of VistaDB will help us market Gibraltar – and that will provide more & better feedback on where the product needs to go.  Thanks to the additional revenue we’ll be able to act on that feedback faster.  In short, your investment in software maintenance with Gibraltar just got more valuable.

Finally, we are going to add staff to help us ramp up to handle both products and ensure that we can keep delivering enhancements at the pace you’ve come to expect since Gibraltar went live.

What’s this mean for VistaDB?

Much of the same things that apply for Gibraltar also apply for VistaDB – more resources for improving the core engine, confidence that we are as dependent as you are on having it work great, and the comfort that the product will be supported for years to come.

We will be developing a roadmap for VistaDB development over the next month based on feedback from the existing community.  Meanwhile, the current site will stay up with some minor changes while we work to integrate the sales & user communities.

Our first priorities will be to establish the ability to get trial versions and handle licensing for VistaDB.  Concurrent with that we’re going to set up the right support channels so you can get the same exceptional service that Gibraltar customers have enjoyed.  The vibrant user forum at VistaDB will also stay up.

Meanwhile, you can continue to access your VistaDB account, perform activations of older versions, and generally carry on as you did before.

First!

If you’d like to get your ideas to the head of the queue and are an existing VistaDB customer, send us email to support at GibraltarSoftware.com.  We’ll be doing some formal feedback gathering, but why wait and risk being missed – get us your ideas today.  We’ve proven repeatedly we listen to our customers when figuring out what to develop.

For the formal announcement, check out the press release.

Categories : ISV, VistaDB
Comments (26)

kick it on DotNetKicks.com
One key requirement of the Gibraltar Agent is to be able to manage the data files it creates on disk to ensure that they can’t grow out of control.  After looking at a lot of options, we determined that we need a central index to track the locally generated files.  The problem was that it had to be absolutely safe to use from multiple processes without risk.  We had some simple xml-based ideas to solve the problem, but early prototypes were not encouraging.

Fortunately, before we launched into getting more and more aggressive with solving the problem someone on the team stumbled over VistaDB, a fully managed database that we could merge into our agent.  We wrote a quick technical prototype and were impressed:  Not only could we safely throw all the data we needed to track into it even in an extended run of our torture test, but we were able to do the evaluation of whether we needed to prune files or not (and what files to prune) within it which made for a fast, clean, maintainable index.

Even with the success of this prototype, we were very reluctant to go down this road.  Our previous generation solution had use MS Jet for a data store, and it had been a source of problems.  We had ported that to SQL Server, and SQL had become a source of problems for some of our clients.  We’d internalized the lesson that databases and logging do not mix if you want an easy to deploy, foolproof system.  We decided to cast a wider net and look at a range of options:

  • SQL Express / SQL Server Embedded: Microsoft’s free offerings.  SQL Express was right out because it was a windows service and would make our deployment complicated and huge.  SQL Server Embedded couldn’t solve our problem because only one process can access the database files at a time.  And oh yeah, it would make our deployment somewhat complicated and large.
  • Other third parties: Without getting into an exhaustive list, we decided it really had to be a completely .NET native, managed implementation that we could merge with our assembly because we were only going to ship one assembly for the agent.  Furthermore, it needed to support syntax at least largely similar to SQL Server so we wouldn’ t have to master multiple environments.
A Dashboard of sessions run locally

A Dashboard of sessions run locally

In the back of our minds was another consideration:  We’ve always intended to grow Gibraltar into offering a larger version for enterprises with centralized log storage and management for many computers.  It was a lead pipe cinch that this solution was going to use SQL Server, so the closer we could be to that on the client the better off we’d be.  The more we worked with VistaDB, the more we started to wonder:  Did we really need SQL Server even for our future larger version?  Could we perhaps just use VistaDB?  We did more prototyping and came to the conclusion that it could work very well technically.

At this point, we knew we had a winner in VistaDB.  A quick prototype showed that we could easily target both it and SQL Server with exactly the same code at each level.  Nothing else could do that:  We could use the same schema, stored procedure code and database access code and switch between SQL Server and VistaDB with none the wiser.

If I see so far, it’s because I stand on the shoulders of giants

local_severity_by_applicationThe best was yet to come.  With VistaDB it’s easy to create a new instance of a database anytime we needed our index data structure.  Opening and closing databases is relatively fast, so we were able to have a common relational database structure available everywhere.  This meant that we could have just one data model everywhere instead of separate ones for data collection and session data management.  Better yet, we could have one set of database access code for both cases, even if we ultimately supported SQL Server.

This opened a lot of opportunities:

  • The same data bound charts and views could be generated against any repository, allowing immediate analysis without having to copy data first into a central store.
  • Very large repositories could be supported because the index data didn’t need to be loaded into memory for processing
  • We were able to redirect the time saved not developing our own XML persistence format and structure into customer value-add features.
  • We could add a reporting system that would require rich, hierarchal data structures.

local_session_durationIn short, if it wasn’t for VistaDB most of the features in the repository view and the reporting system of Gibraltar would only be available in a future release that used a central server.  In a small way VistaDB did cause us to spend more development time than we would have – we got so excited about the potential for some of the features that we’d written off as being too expensive to implement that we held the release until we could get them done.

Now, we could have written our own thing for a lot of these pieces, but it’d taken a lot more time (particularly since we target .NET 2.0, not 3.5 so no LINQ for us) and there are some parts that we’d have always been worried about – namely the fundamentally hard problem of having may processes accessing the same file doing reads and writes.  That’s just a hard problem to get right period, and it was great to hand it off to someone who worried as deeply about that as we do about logging and metrics.  In the end, it’s a great example of not reinventing the wheel unless you want to learn a lot about wheels.  We already knew a lot about databases and shared files, enough to know that we didn’t want to learn any more.  We were much more interested in digging into the areas that added unique value to Gibraltar.

Beware the gift Trojan horse

One last concern we had was that adopting anything into our agent would create a long term obligation for us to support it.  Customers would expect that multiple agent versions would safely interoperate on the same computer.  Furthermore, because we would use the same format in our Package files used to send session data between computers we’d be required to support it for years.  We’d have to get comfortable that this was feasible.  To this end, we needed the solution to either be available completely through source code we had or through a company that we were absolutely sure would support it long term.

A great incentive for long term support is revenue.  People in general and for-profit companies in particular are motivated most directly by the idea that people will pay them to do something.   This was another place where we were a bit concerned with using one of the options from Microsoft because they were all free.   Microsoft’s motivation for creating these products is primarily defensive – prevent people from using other free options in the hope that they’d eventually upgrade to one of the nicely expensive server options.  That’s fine and good if you’re targeting one of those options and just want a free scale down option, but this was central to our product.

We could have gone open source, but there are a few issues there for us:

  1. No open source option even came close to offering what VistaDB did – namely the ability to support stored procedures compatible with SQL Server.
  2. We’d then largely be on the hook for our own source code support if the community wasn’t doing what we needed, and the whole point here was to not have to write something.
  3. We’d have to very carefully scrutinize the open source license to make sure we didn’t get GPL’d into oblivion.

Part of our concerns were mitigated by VistaDB offering a source code license, so we could get source code to make our own changes if we had to.  But really, I wasn’t looking to ever need to write this code, that’s why I wanted to get someone else’s solution.  With VistaDB, we found not only a strong community of folks that have built products around it, just like we were, but also a small company owned by a guy that believed fundamentally that folks like us had to succeed with his product for him to stay in business.  His goals tightly aligned with our needs, and that’s a great precondition for success.  That’s good, but there’s another big condition for success:  Was this company focused on listening to its customers?  Would it respond to our concerns?

As is our tradition, we sent an unsolicited block of product feedback to VistaDB.  There were some good things, but we had some concerns as well.  We got a point by point response from Jason Short, the CEO of VistaDB.  Better yet, we got an invitation to a conference call.  Now, at this point VistaDB had gotten a total of $300 from us.  We spent over 7 times that on the fancy graphing component we use, and compared to the criticality of it to our system I’d have spent a lot more to solve the problem we had.   I was impressed by a series of conversations we’ve had with Jason and his team, and how dedicated they are to doing the right thing in solving the challenges they’re up to.

When you think about how much money it costs to create software of any sophistication, it’s great to be able to pull off a piece of your complexity and hand it to someone who will care more about it than you do.  We’re so happy with how much VistaDB added to what we could offer our customers that you’ll notice in our About Gibraltar page that we’re proudly Powered by VistaDB.

about_gibraltar

kick it on DotNetKicks.com

Categories : .NET, Data, Development, Logging
Comments (4)