Data Mining Techniques

 

 

What is Data Mining?

Data mining is a technique used by computer scientists to uncover obscure relationships among huge amounts of data collected over time. The data may be details of customer purchases at a particular grocery store, as it is in this story, traffic flow through a particular intersection in a big city, or just about any other long term event data stream that you can imagine.

Let's stick with the grocery store example in the story. A single day's database can be thought of as a stream of data blocks, one per sales transaction, called a record. Think of each record as a sales receipt: it has the date and time, cash register number, the sales clerk's identification number, and every item that the customer purchaced. Also contained in the record is every discount coupon they used, whether they paid cash, or by debit or credit card, and the number and brand of card, if used. Finally, if the customer used a "store discount" card, the record contains a customer indentification number, which can lead us to the name, address, telephone number and a lot of other information about that customer, including past purchasing history.

As you can see, this database, even if it contains only a day's worth of data, is very rich in detail! Let's have a look at some of the things we might do with these data to improve sales at the store.

Histograms:

A histogram is a bar chart. Each bar represents a count of something relative to the other bars on that chart. In this case, we used percentages instead of totals, but the effect is the same: you should be able to see any changes between the Before and After charts:

Before:

First note what percentage of customers (who bought hot dogs, buns or both that day) only bought hot dogs.

Before

After:

Next note how the percentage of customers who bought only hot dogs is reduced by an amount equal to the increase in those who bought both hot dogs and buns. In the story, some of the buns were put by the hot dogs, and the customers who bought hot dogs added the conveniently located buns to their carts, increasing the store's sales!

After

Clustering:

In our story, the expert already knows that customers who buy hot dogs may also buy buns. He discovered that through a data mining technique called clustering. You pick a variable, like buns, and the clustering detection software will show you what other items are likely to be purchased with it. Note that the answer is not always obvious, nor is it consistent across the country. In some sections of the U.S. it is customary to put ketchup on a hot dog, while others may consider it heresy! And in some other regions, sour kraut is a common addition. You need to discover these patterns from your own data, and try hard not to make prior assumptions.

Want More Information?

The above exercise was an example of Pattern detection. Once we discovered a pattern we wondered if we could influence a customer's buying behavior by simply moving one product close to another, so we did an experiment to proved that this was true. Finally, we put separate bar codes on the buns that are next to the hot dogs so we can know how many customers who bought both were influenced by their proximity.

What else can Data Mining do for us? Enter "data mining" into your favorite Internet search engine to find out!

 

Return to Digit's Home Page.