Wednesday, June 13, 2012

How to Use Sampling to Eliminate Manual Census

I was chatting with the transportation director for a large organization about its commute program. Twice each year, he is required to take a census and capture each employee's transportation mode for 5 consecutive days. [1] Making his job more difficult, he must discount students and visitors.

Statistics Makes The Job Easier & More Accurate

Currently, human counters armed with clipboards complete the inventory. After the data is captured, it is then entered into a simple application to provide reports.

This census requirement is a great application for statistical sampling to make the job easier and less costly. Although the example is for a commute program, the 3-step process outlined in the next sections can be used broadly.

The leverage point is exploiting the statistical “law of large numbers” or Bernoulli’s Law. Bernoulli proved that when the amount of data is large, the results of a random sample are nearly the same as the results when using all the data. 

There may be small statistical variations when using random sampling, but it provides sufficient precision for performing most analysis. The accuracy depends on the sample size. A 1 per cent sample provides more than 99 per cent accuracy.

In the case of counting commuters, random sampling will likely improve accuracy since the manual counters may miss commuters, incorrectly add visitors or students, unable to determine impact of commuters who are travelling or on vacation and incorrectly categorize the transportation mode. 

Step 1: Build Your Data Set

The sampling process would use HR data as its source. [2] It is critical to get all the members of the set, otherwise the sample may be skewed. 

The notorious failed prediction of the 1948 Presidential election provides an illustration. Pollsters had Thomas E. Dewey beating Harry S. Truman, only to be embarrassed by the actual voter turn-out. 

The trouble was the pollsters had conducted their poll by phone. Not every household had a phone in 1948 and more phones were owned by Republicans who favored Dewey, which skewed the sample.

In the case of commuter inventory, there is often a different source for contractor data than employee data.  Multiple sources may be required to find all the commuters. 

Step 2: Randomly Select Your Sample

Once all the commuter data is compiled, the next step is to randomly select a subset for follow-up polling. For example, Excel has a built-in sampling function. If not already available with the products owned, random sampling algorithms are available for a small fee or through open-source. 

It is critical to avoid special sampling criteria or errors will be introduced. Letting some departments or executives avoid the follow-up call would invalidate the process.

Step 3: Poll your Sample Set

The final step is to contact individuals that were selected from the data to sample and determine their commute mode for each day of the week. This could be automated by sending an email and explaining the new process and how it reduces costs and improves accuracy. An introduction to department leaders prior to the follow-up could enlist them to communicate and help with follow-up.

Of course, there will be laggards who do not respond and follow-up with a personal phone call may be necessary. If some cases, their managers may be needed to enforce. Even with this extra effort, it is much less intensive that counting each commuter. If 1% sample size is selected and even if 20% require follow-up, that is only 2 out of 1,000 that would require contacting which is manageable.

Statistics - The 1% Solution

This new statistical approach provides one other benefit. Polling could be completed more frequently to understand how commute patterns change. They could measure during different seasons to determine if fewer bikers due to inclement weather in the winter. They could measure the effectiveness of new marketing programs. 

A sustainable commute program is all about efficiency. Using the power of statistics makes the analysis of the commuter data just as efficient.   


1. Background on the commuter inventory requirements 
The sustainable commute program receives some government funding which is common among these types of programs. This program also was part of an expansion agreement between the county and the organization. The county only allowed expansion if the organization could avoid increasing already clogged traffic arteries
2. Proper handling of data
Since this is HR data, remember to pull only data that is required to prevent information security issues. The data should be limited to employee name, unique id, and work contact information.

No comments:

Post a Comment