Posts Tagged unit testing
I had created some test data on a labmanager VM. It had taken some time — I had recorded a few calls, an email, and a chat, then tweaked the recording metadata to try out a few important aspects of the stored procedure I was testing.
It’s at these times that I get the nagging feeling — the test was so time-intensive to create and run, why not take the step of automating it so we can run it in the future and catch regressions ‘most instantly?
I wanted an SQL script that I could run that would INSERT the test data as it currently existed on my test machine. And I didn’t want to create those INSERT statements by hand. Boy, did I not want to write them all by hand.
Microsoft SQL Server 2005 comes with something called Database Publishing Wizard (look under
Program Files\Microsoft SQL Server\90\Tools\Publishing\1.2\ ). With the following command:
sqlpubwiz script -d MyDatabase -dataonly c:\temp\MyDatabase.sql
…I was able to generate INSERT statements for all the data in my little test database to an .sql script file. I then edited the file to remove things I didn’t need (and to avoid key collisions by substituting
NEWID() calls for hardcoded uniqueidentifiers as necessary).
Finally, with all the data in place, I could add the following tests. (If anything goes wrong, an error message appears in the Messages window.)
DECLARE @tempTable TABLE(BlahId uniqueidentifier, BlahType int, BlahDateUTC DateTime, BlahLength int, PhoneNumber nvarchar(60), RightsMask int); DECLARE @return_value int; INSERT INTO @tempTable EXEC @return_value = [dbo].[blah_get_blah_chunk] @i_sessionId = @sessionId, @i_chunkSize = 50; IF @return_value <> 0 RAISERROR('Return value should be zero but was %d', 15, 1, @return_value); DECLARE @expectedRecords int; DECLARE @actualRecords int; SET @expectedRecords = 5; SET @actualRecords = (SELECT COUNT(*) FROM @tempTable); IF @actualRecords <> @expectedRecords RAISERROR('Expected %d records in result set but found %d', 15, 1, @expectedRecords, @actualRecords); DECLARE @actualDisplayName nvarchar(60); DECLARE @expectedDisplayName nvarchar(60); SET @expectedDisplayName = N'blah_user -'; SELECT @actualDisplayName = (SELECT LocalPartyName FROM @tempTable WHERE RecordingId = N'a0ca1a4f-0933-4565-b5da-d8fc3222a42a'); IF @actualDisplayName <> @expectedDisplayName RAISERROR('Expected display name to be "%s" but it was "%s"', 15, 1, @expectedDisplayName, @actualDisplayName); GO
In the future we could have a script test runner that keeps track of the output and flags errors as part of the build… but one step at a time, right? This already takes test scenarios that took more than a day to set up and makes them runnable in a minute or two.
Database Publishing Wizard can also generate script to create the entire database schema, but it’s the data generation that will support my unit test needs.
Thanks to Pinalkumar Dave for his helpful write-up titled Generate Script with Data from Database – Database Publishing Wizard
Automated unit testing is not yet to a mature state here on the C++ side. We’re kind of just getting started*. Part of just getting started is a lack of tool support. Funny – when Jon talked about the importance of maintaining a readiness to create tools that streamline the work in your specific project or environment, I wasn’t sure if he was on track or not. How important is it, really?
Then I realized that I’m doing the same thing in my current environment. So I guess either I’m as crazy as Jon** or Jon’s onto something here. :)
On to my issue. Our C++ unit test support is currently command-line based, and it puts out a lot of text to the screen. I found I was having to carefully cull through a few screens of data looking for what went right or wrong. It was too much work for each test run. I decided to colorize. ANSI escape sequences to the rescue! (putting to use our research from before.)
# chuiredgreenbar.py # # Purpose: Colorize the output of dmake runtest to highlight passing and failing tests. # Author: Daniel Meyer # Version: 0.1 # Date: 10/28/2009 # # Usage: # dmake runtest 2>&1 | python c:\bin\chuiredgreenbar.py | cat # # TODO: Figure out a way to get ANSI escape sequence support without cat (On my XP machine, cmd.exe doesn't # seem to natively support the ANSI escape sequences; cat works around this). import sys, re def green_bar(s): return chr(27) + "[32;40m" + s + chr(27) + "[0m" def red_bar(s): return chr(27) + "[31;40m" + s + chr(27) + "[0m" pass_pattern = re.compile("passed") fail_pattern = re.compile("([0-9]+).+[0-9]+ (failed|aborted)") for raw_line in sys.stdin: line = raw_line[:-1] result = fail_pattern.search(line) if result: if result.group(1) == 0: print green_bar(line) else: print red_bar(line) elif pass_pattern.search(line): print green_bar(line) else: print line
The command line
Filter the output of dmake through this script with the following command line (cover your eyes):
dmake runtest 2>&1 | python c:\bin\chuiredgreenbar.py | cat
- The red background is not part of the highlighting – that’s another tool thing I won’t get into here.
- I barely know Python
- The super hacky part is having to pipe the output to cat to get ANSI escape sequences interpreted (not to mention merging stderr in with stdout since some of dmake’s output goes one place, some the other and we need to process both)
This warty construction is “in production” on my PC – I use it ‘most every day. If it weren’t for something like this, I would probably still be straining my eyes to see which test cases failed. Perhaps eventually there will be a graphical UI for the Boost unit test output; but till then, this was an efficient way to fill a need.
*I could look at this negatively, but hey – I get to be part of bringing this discipline to my company, and besides that, there’s openness to the idea. Smells like opportunity to me!
**I would consider that designation an honor. We need more of that kind of crazy in the industry!
Someone recently asked me about mocks. As I hadn’t yet put my thoughts in writing, I do so here. Hopefully there’s some helpfulness scattered here somewhere…
“I guess we can only integration test this code – it pulls in the world”
As you start to write unit tests for your code, you quickly find that while there are a few methods that are easy to test, you quickly get bogged down with collaborators. “Oh, I guess I have to set up a real database server with test data out there somewhere or this method can’t be tested…” or “Rats, I don’t want to have my test run the part that initiates a transaction with this remote internet service, because it will result in charges to the company”, or “These blocks of code only run if the collaborator throws some obscure error, and I don’t have any way of reliably creating ”
More generally, here are some of the challenges to dealing only with integration tests*:
- Setting up the infrastructure for the test takes time and effort – sometimes involving configuring and maintaining database servers or setting up accounts with third parties
- Tests tend to run slowly because they must talk to the database, place a phone call, really perform the complex transaction processing, etc.
- When a test breaks, additional time and effort is needed to isolate the cause, as there are many moving parts
- Due to the infrastructural setup required, the test itself is more complex and prone to bugs. Time must be spent diagnosing in which class the problem lies (potentially through many layers of the system, sometimes spanning servers, DLLs, and codebases)
Dealing gracefully with collaborators while unit testing is what mocks are all about.
*(See also the earlier article, Why not just integration tests?)
Dealing with the Collaborators
As the EasyMock page says:
Unit testing is the testing of software units in isolation. However, most units do not work alone, but they collaborate with other units. To test a unit in isolation, we have to simulate the collaborators in the test.
A Mock Object is a test-oriented replacement for a collaborator. It is configured to simulate the object that it replaces in a simple way. In contrast to a stub, a Mock Object also verifies whether it is used as expected.
The syntax varies with the mock framework: a couple of examples from the HippoMocks 3.0 tutorial follow:
and (here’s a big reason I can’t ignore mocks):
See that? Expect the c() method to be called and throw the specified exception. This can be really valuable in testing your error handling – you don’t have to laboriously set up a complex net of conditions to get a subsystem to really throw an exception – you just tell the mock object to throw it and that’s that.
Mocks not a replacement for integration tests
Mocks aren’t a total replacement for integration tests. Many times the problem is not in class A nor class B, but in their collaboration. Integration tests are valuable to prove that the collaboration works as expected. But it is faster and easier to diagnose a problem in the collaboration of units that have been thoroughly unit tested in isolation. To put it the opposite way, when you’re working with a group of collaborating units that haven’t been thoroughly tested in isolation, “the problem could be anywhere”!
Limitations and downsides of mocking
It’s a Simulation
When using mocks to test your unit in isolation, you’re simulating interactions with the unit’s collaborators. A risk inherent in simulation is the risk that the simulated interaction diverges from interactions that use the real collaborator.
Coupling between the test and the collaborators’ implementation
In insulating a unit from its collaborators, mocks couple your tests to those collaborators. When you change the way calls to a collaborator work, you have to adjust mocks’ expectations as well. This coupling can decrease the flexibility of your code if not understood and managed (this answer to “When should I mock?“‘s Mock shortcomings section gives examples of this problem).
I’ve experienced this kind of brittleness, and it is an important consideration. My hope is that such brittleness can be avoided by relaxing the expectations down to the minimum needed (see for instance the “2. More Complex Stuff” section of the HippoMocks 3.0 Tutorial ). It’s a challenge to do this right the first time, though, because the brittleness of a test using mocks may not be apparent to you until time passes and the collaborator changes its implementation, breaking a test. It may be a learning process for mock users, to develop a standard practice for the right kind of looseness that supports .
Why Not Just Fakes, or Ad-Hoc Mocks?
There are ways to get independent unit testability without using a mocking framework.
- You can create your own fakes (i.e., create a testing implemention of your collaborator’s interface) — certainly handy at times
- You can create your own mocks without using a framework (the article Should we use a mocking framework?)
The thing to watch about these approaches is the amount of boilerplate code that’s required. During my experience in Java-land, using EasyMocks in conjunction with Unitils reduced the boilerplate code down to a really small amount. It was great.
I’ve written up some initial results on that front as well.
While using mocks without care can lead to brittle tests, their power for easily testing the unit under test’s responses to interactions with a collaborator cannot be ignored. I hypothesize that by setting up the most relaxed expectations that fulfill the test’s need, the brittleness problem can be avoided; but more study (or at least more Googling) needs to be done to prove or disprove this. In the meantime, I believe the right course of action is to proceed (with our eyes open) with the use of mocks, since the current alternatives, including having integration tests but no unit tests, must leave us in the position of leaving certain code untested. I don’t see how it wouldn’t anyway…
Shoot, now I need to find a good C++ code coverage tool, to prove it! : )
We didn’t want to spend time agonizing over whether a test should be considered a unit test or an integration test. We wanted an easy way to decide, such that the resulting division of tests would meet our needs. Here are some guidelines I wrote up to that end a couple of years ago, as a companion article to the Why not just integration tests article.
So you’re writing a code-based test, and first off you need to decide whether it should be considered a unit test or an integration test so that you know how to name the test class. How do you decide?
There are two things we want to be true about a thing we call a unit test1:
- It runs fast
- It is geared toward error localization
To consider a test a unit test, it needs to not take very long to run.
Why do we say that?
The reason we don’t want unit tests to take a long time is because we want to be able to run them often, during development. We want to be able to code a little, run our tests, code a little more, run our tests…. If our tests are too slow, it will break the rhythm and in practice we will just end up not running our tests very often.
Ok, so what kind of speed are we looking for?
|Unit Test Speed Guidelines|
|Ideally a unit test class will run in under…||0.1 second|
|We should aim for a unit test to run in…||< 1 second|
|Many of our unit tests may run in…||1-3 seconds|
|Some of our unit tests may take as long as…||~5 seconds|
If a test tends to take closer to 10 seconds to run, it needs to go in the integration test bucket.
Some tests have to do some setup that may take, say, 15 seconds the first time the test is run in a namespace, but then subsequent runs take less than one second. That is ok, the slow first run is not likely to be a barrier to people running the test often since subsequent runs are fast.
There may be “artificial” conditions that make your test run’s duration approach 10 seconds, when in a more normal situation the test run duration would be much shorter. Use your judgment in these situations, keeping in mind that the main thing is to keep our body of unit tests fast enough to run often.
To consider a test a unit test, it must also be “geared toward error localization”.
- Error Localization
- When a failing test points you to the errant code that is responsible for the failure.
One industry guru puts it this way:
As tests get further from what they test, it is harder to determine what a test failure means. Often it takes considerable work to pinpoint the source of a test failure. You have to look at the test inputs, look at the failure, and determine where along the path from inputs to outputs the failure occurred. Yes, we have to do that for unit tests also, but often the work is trivial.2
So even if a test is fast, if the main assertion(s) of the test methods seem to be lost among all the setup and tear-down code, it probably belongs in the integration test bucket.
If your test is too slow to fall within the unit test speed guidelines, or if it is geared more toward broad coverage, then it belongs in the integration test bucket.
Manual test helpers
If your test can’t do everything automatically, but instead performs setup to make it quicker to perform manual testing, it’s a manual test helper, and it belongs in the
System integrity tests
If your test tests delivered data rather than code (such as making sure that each delivered field has a table specified or that each delivered R-model relationship is owned by serial number 8500), it is a system integrity test, and it belongs in the
TODO: Answer other questions about tests, such as:
- Is it ok to change system settings in a unit test? in an integration test? (yes, but the test should leave them to the way it found them)
- How strict should we be about a test cleaning up records, globals, log messages, etc. it creates? (Clean up whatever you can – accounts with payments on them are a known issue…)
- Sub-issue of the above: should we recommend that tests use transactions to undo complex things like account creation? What level of endorsement would we give: Is it preferred, acceptable where needed, or discouraged?
- Is a test class responsible for making sure two instances of itself don’t run concurrently, if it’s not thread-safe? Or should we assume that only one process at a time will be running tests in a namespace?
- We’re explaining what we will consider to be a unit test for our purposes, versus what we will consider to be an integration test. We’re not trying to address the question of what really is a unit test versus what really is an integration test. For instance, Michael Feathers says that if it writes to the database it’s not a unit test, but for our purposes if it writes to the database and it’s still fast enough, it can go in the unit test bucket.
- Michael Feathers, in Working Effectively with Legacy Code, p. 12.
— DanielMeyer – 06 Dec 2007
The following is a mildly modified version of a post I made to our internal TWiki web a couple of years ago, under the title Why Not Just Integration Tests?
Do we really need actual UNIT tests? Why aren’t integration tests enough?
1. What’s an integration test?
An integration test:
- Tends to test more than one source file at a time
- May depend on other subsystems (e.g., may write to the database, need the workflow monitor to be running, perform an actual import, do an actual release load, need ECP set up between your computer and another, need a real dialer configured, etc.)
- Creates the data it needs for its test scenario (tables, imports, workflows, clients, accounts…)1
There are certain things that integration tests do better than unit tests:
- You don’t have to break as many dependencies to get the code under test. Instead, you take the code you’re changing together with all its dependencies
- They provide basic test coverage for large areas of the system, which is useful when great expanses of the codebase are not under test.
- They test how code components work together. This is something that is not in unit tests’ job description, and it’s an important item.
Integration tests are kind of like the Marines. They go in first and give some cover to the rest of the troops. But you can’t win a full-scale war with just the Marines – after they’ve established a beachhead, it’s time to send in the rest of the troops.
Integration tests do have disadvantages as well:
- They are harder to read and maintain. Because integration tests generally need to perform setup for the tested code’s dependencies, the code of integration tests tends to be thicker and harder to read. It’s easy to get lost in what is setup and what is the main test. And it can take more careful analysis to to be sure the test itself doesn’t have a logic error in it.
- Their code coverage is low. Even if your integration test covers several scenarios, getting anywhere near complete code coverage is usually somewhere between tediously difficult and impossible. Running a whole scenario is just too coarse of a tool to get that kind of coverage. (As a sidenote, this is also one reason manual testing is not enough.)
- They tend to be slow-running (30 seconds to half an hour)2.
- They take longer to write. In the short-term, when testing legacy code integration tests are still quicker to write than unit tests, since changing the production code to be unit testable takes time and effort. But once you break the dependencies of a class or routine for unit testing, future unit tests will no longer pay that cost, and integration tests will be the slower ones to write (probably by a large margin).
Integration tests are important and won’t be going away, but to get to the next level we need to be able to unit test individual classes3.
- Dependency breaking
- Changing the production code to make it so that you can test one method at a time without having to have the workflow monitor running, doing an actual release load, etc.
- Test case
- a test class that tests a production class or routine. Note that the term test case refers to the whole test class, not just one method on the test class.
- Code coverage
- How many lines of the source file(s) being tested are exercised by unit tests or integration tests, expressed as a percentage. For example, 100% code coverage means that every executable line was executed by the tests. The lower this percent is, the more code we’re shipping that is never exercised by tests.
For Further Reading
The Trouble With Too Many Functional Tests – http://www.javaranch.com/unit-testing/too-functional.jsp
- Instead of having each integration test create and set up its own data, another approach is to have certain things already set up that the test environment can rely on. That has quite a bit of appeal – it would make the setup required by each integration test smaller (and they would run faster because of that); they would also be more readable and maintainable. The reason we haven’t gone with that alternative up to this point is that it’s really hard to know what side effects the production code your integration test runs may have on the system. It seemed safest to have each test be responsible for setting up what it needs. Perhaps we’ll revisit this decision in the future.
- 30 seconds may seem pretty quick for an integration test; but as we move toward the discipline of running existing tests often, and as that body of existing tests continues to grow, we’ll want the unit tests to be quick – like under half a second per test case. This will enable a continuous build server (for example) to run all unit tests after each commit to get near-instant feedback on whether we broke anything. The build server would still run integration tests, but because they tend to be long-running they may only be able to run once or twice a day.
- And routines… though that’s a bit harder, since our routines often aren’t really units of functionality – they’re more like a bag-o-labels… there’s certainly work to be done!
— DanielMeyer – 20 Oct 2007