What is Data Mining?
Data mining is a technique used by computer scientists to uncover obscure relationships among huge amounts
of data collected over time. The data may be details of customer purchases at a particular grocery store, as
it is in this story, traffic flow through a particular intersection in a big city, or just about any other
long term event data stream that you can imagine.
Let's stick with the grocery store example in the story. A single day's database can be thought of as
a stream of data blocks, one per sales transaction, called a record. Think of each record as a sales
receipt: it has the date and time, cash register number, the sales clerk's identification number, and every
item that the customer purchaced. Also contained in the record is every discount coupon they used, whether
they paid cash, or by debit or credit card, and the number and brand of card, if used. Finally, if the customer
used a "store discount" card, the record contains a customer indentification number, which can lead us to the
name, address, telephone number and a lot of other information about that customer, including past purchasing
history.
As you can see, this database, even if it contains only a day's worth of data, is very rich in detail! Let's
have a look at some of the things we might do with these data to improve sales at the store.
Histograms:
A histogram is a bar chart. Each bar represents a count of something relative to the other bars on that
chart. In this case, we used percentages instead of totals, but the effect is the same: you should be able to
see any changes between the Before and After charts:
Before:
First note what percentage of customers (who bought hot dogs, buns or both that day) only
bought hot dogs.
After:
Next note how the percentage of customers who bought only hot dogs is reduced by an amount equal
to the increase in those who bought both hot dogs and buns. In the story, some of the buns were put
by the hot dogs, and the customers who bought hot dogs added the conveniently located buns to their
carts, increasing the store's sales!
Clustering:
In our story, the expert already knows that customers who buy hot dogs may also buy buns. He discovered that
through a data mining technique called clustering. You pick a variable, like buns, and the clustering
detection software will show you what other items are likely to be purchased with it. Note that the answer is
not always obvious, nor is it consistent across the country. In some sections of the U.S. it is customary to
put ketchup on a hot dog, while others may consider it heresy! And in some other regions, sour kraut is a common
addition. You need to discover these patterns from your own data, and try hard not to make prior assumptions.
Want More Information?
The above exercise was an example of Pattern detection. Once we discovered a pattern we wondered if
we could influence a customer's buying behavior by simply moving one product close to another, so we did an
experiment to proved that this was true. Finally, we put separate bar codes on the buns that are next to the hot
dogs so we can know how many customers who bought both were influenced by their proximity.
What else can Data Mining do for us? Enter "data mining" into your favorite Internet search engine to
find out!
Return to Digit's Home Page.
|