Monday, December 31, 2007

A Nice Side Effect Of Query Cache

I came across an interesting advantage of the query cache over the MSSQL internal caching strategy today. Because the query cache is simplistic - that is it stores queries and results and returns results from the pre-computed cache when a repetitive query is encountered, it is also great for repetitive data analysis in a more OLAP environment.

For example - if the client application asks many queries, and the data isn't updated very frequently - say once a week, the mySQL query cache beats out the performance of MSSQL's strategy the more often you repeat the query or the longer the time between data updates.

The result of this seems to be that MySQL is a great storage engine for things like neural analysis and even testing and storing the results of multiple regression analyses.

As a practical example - I have a regression equation in a client application and I want to test and store the results of the regression given various weights on different variables. Because the base data which is being inputted in to the regression equation doesn't change, if I want to test a universe of 100 variations over 12 variables, I'm going to run 1200 regressions on the same base data coming out of the database. The query cache through computing the first time and serving cached answers the second time can put the onus of speed back on the client application thereby dividing the area for possible performance bottleneck across the whole system.

In short, MSSQL doesn't do this except as its proprietary caching strategy sees fit. My results net MySQL about 5-6 times faster even after the MSSQL cache has adapted and almost 20 times faster on the second through 5th regression.

Wednesday, June 27, 2007

A knock on MySQL

In the course of the translation, it made sense to set up a replica of the mySQL db on the same host as the MSSQL db. Ok, all well in good, except that there seems to be an issue in MySQL which isn't at all well documented whereby the slave chokes on large update statements - sighting an Error 5 Out of Memory.

Luckily - there is a quick fix - the replication thread tells you the SQL statement that beat it - just copy this in to a client and execute it - the slave on its next rety attempt will then attempt a much smaller (or cached) update and peacefully carry on to the next statement.

Watch Out - Order of Inner Joins

Just a quick one here - when porting SQL statements from mySQL to MSSQL - you need to mind the order of your joins. Whereas mySQL likely re-orders the joins so that columns used in one join is available to the next, MSSQL quickly barks at you unless your join your tables in the order so that you never reference a table not already previously mentioned in the SQL statement.

Pretty linear limitation for a product which is completely based on set theory relation rather than linearity.

Saturday, April 14, 2007

... invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

Ok, so the first major stumbling block I've run into seems to be a common problem. I'm making this post title the error message in case someone else is looking for the answer.

Basically - any time you want to get a list of columns from a table, but you want to group the results because your using sum() or max() or the like - SQL Server will not let you do this unless your grouping on every column - which doesn't really help because that would give you all the rows!

This problem arises often where you have a table that records things and has datestamps signifying when the event occured. Lets say you want to query a table and see only the MOST RECENT occurence of an event, well max(datestamp_col) would work great - now lets say you want to join this resultrow with another table to get some more meaningful info about the event - BAM - group isn't allowed

There are actually two solutions to the problem - the first is to get the max value from a subquery and return the value as a column - this is most elegant, but not exactly novice friendly or intuitive.

---- below is an example ----

SELECT rs.LastLogon, rs.[Name], lo.DC, lo.Location
FROM (
SELECT MAX(LastLogon) AS LastLogon, [Name] AS [Name]
FROM logons
GROUP BY [NAME]) rs
INNER JOIN logons lo
ON rs.[Name] = lo.[name]
AND rs.LastLogon = lo.LastLogon
ORDER BY rs.LastLogon DESC

------------------------------


The second solution works only for things like max() - don't aggregate and use "top 1" (yes this is like limit 1 in mySQL) instead and order by the date column or whatever instead. Hardly a universal solution though - so you're stuck with the first one in most cases.

Anyway, for those who have written extensive queries including columns not in the aggregation - have fun rewriting these!!

Friday, April 13, 2007

My First Rant - Importing Tables

So the server arrived...finally. I log into the Microsoft Sql Server Management Studio, by now having gotten some experience with it through use in Microsoft Sql Server Express.

Oh by the way - if you're not needing to do a bunch of db admin stuff - using the Express is a great way to get familiarized with the basic layout of the Management studio - half the battle here folks.

I'm not expecting the world so I start to look into porting over the tables from mySQL to Microsoft. I create the new database and right click on the little drum - "Tasks" allows me to import a table. This is pretty cool. I've got my MyODBC driver loaded - a nice slow but reliable connector - I select the ODBC option and it asks me for a DSN or connection string.

At this point - you could add the string, or just create a user DSN which is much easier to change later if needed. Plus, you can test the connection properties and get a response for those people not speaking hexcommando language if it fails.

Once I get this set up, its pretty nice - choose select * from whatever_table and you've got your import - make sure to check the column mappings though to make sure no silly truncating through the translation.

Next - if that step went right - it will ask you whether to execute now or create an SSIS package. Trust me - unless your project is small - choose to create an SSIS package - it allows you to not have to enter all the bs before and allows you to re-use the package for implementing db mirrors just by changing the DSN above.

Oh yeah - save it to the filesystem.

So lets see if the process works. If all goes well - it works. If not - you get a lot of nonsensical and totally useless errors which spew from the first totally useless error telling you something went wrong and giving you a hexidecimal error code. Thank god for google - get ready to start examining field values in your original table for something slightly out of the "acceptable" range for MSSQL.

Things I've found it not to like mostly include dates and longer text columns which could be nvchar or ntext depending on which side of the brain is currently controlling you.

In the worst case - SQL2005 is complaining about a constraint on a column definition you gave it - kind of ironic don't you think? It lets you put in an invalid value for a constraint to prevent users from entering an invalid value!

Sunday, March 11, 2007

Introduction

Though I'm sure this isn't the only blog on the web about migrating mySQL to other databases, I'm starting this one to hopefully spread some knowledge about the process of porting a mySQL database and the application using it to Microsoft SQL 2005.

It wasn't my decision to port this. In my opinion its uneccessary, but as many in the non-software industry know - with IT its not about "If it ain't broke, dont fix it" - its about who is selling to the CIO and stressing to them the importance of developing a 10yr IT plan - the same 10yr plan that changes every 2 years! Its no wonder the Micro$oft salesperson didn't leave any room for mySQL in the 10yr plan!

Anyway, enough ranting - I'm actually excited because its a chance for me to do my own comparison of what seems to be THE TWO most popular and most argumented about alternatives for "getting things done".

So, stay tuned, the development box is coming this week and I'll be putting up my first impressions, pitfalls, things that are cool. Remember - its not about DB loyalty - its about getting the job done quickly and accurately.

Cheers,

Ross