AUTHORED BY DONALD C. GILLETTE, PH.D., DATA CONSULTANT @ GUIDEIT
Last week we discussed data mining. I shared a query using census data; discussing how data mining is of great value in creating Business Intelligence and driving new business. Today we will explore some great sites for data mining and how to do it.
We begin with one of the largest web service suppliers, Amazon. Amazon maintains the largest collection of remote computing services...unless you ask Google or Microsoft. All provide cloud computing, big data services and mass storage. They also provide API access to large data mining sites. For example, Amazon provides access to many data sets including the following...
- Climate Data
- Genome Data
- Material Safety Data Sheets
- Petroleum Public Data Set
Let’s explore a scenario where you work for a Health Insurance company and do Business Analytics. Your Marketing Department asks for your assistance in getting information to price a policy for a company that refines oil and gas. The prospective client provides the Marketing Department with the following information:
- List of all locations, with employee demographics (age, gender, etc)
- List of all chemicals used by location
- List of all products refined at each location
- Several other relavent pieces of data
Where do you start? Using your Amazon account, create a repository for the relevant information. Then, take each data set, and apply it to the project. For example:
- Climate Data: Based on past experience in the Insurance market, I know that weather effects health depending on climate. The first query I build will create a cube that includes all the facts related to the climate in that location and it’s surrounding areas.
- Genome Data: My next cube will explore demographics specifically around gender and age. By knowing the averages of diseases (Cancer, Heart Disease etc), this can help determine the risks involved in insuring this group.
- Material Safety Data Sheets & Petroleum Public Data Set: Combined, I can create a cube that lists the products refined and the chemicals used, as well as any known carcinogens.
- Additional Options: For this example, FICO scores are important. This effects cost and is pretty much a non-negotiable in making quote decisions.
By continuing these steps and combining cubes, I’m able to discover a more complete perspective. Now when meeting with the Marketing Department, they have a widespread analysis that allows them to determine the most cost effective and comprehensive way to insure this client. It sounds complicated, and it is. But, it’s one of the most vital and largest responsibilities of Business Intelligence.
How does your organization use data mining to solve business challenges?