For anyone starting out on data and database design there are lots and lots of books and articles to help get them started.
But for those people who have been doing database design for a while, it is always good to reflect on your approaches and techniques.
I recently attended a presentation by Steve Hoberman. If you ever get a chance to attend a data design presentation by him, I would highly recommend it.
He addition to his presentations and database design courses, he also writes for the website Information Management.
His series of articles can be found at
and his company website is
I was recently invited by Sandro Saitta, who runs the Data Mining Research blog (http://www.dataminingblog.com/), to write a guest blog post for him. The topic for this guest post was Can Database Developers do Data Mining ?
The original post is available at Guest Post- Can Database Developers do Data Mining –
Here is the main body of the post
Over the past 20 to 30 years Data Mining has been dominated by people with a background in Statistics. This is primarily due to the type of techniques employed in the various data mining tools. The purpose of this post is to highlight the possibility that database developers might be a more suitable type of person to have on a data mining project than someone with a statistics type background.
Lets take a look at the CRISP-DM lifecycle for data mining (Figure 1). Most people involved in data mining will be familiar with this life cycle.
It is will documented that the first three steps in CRISP-DM can take up to 70% to 80% of the total project time. Why does it take so much time. Well the data miner has to start learning about the business in question, explore the data that exists, re-explore the business rules and understand etc. Then can they start the data preparation step.
Database developers within the organisation will have gathered a considerable amount of the required information because they would have been involved in developing the business applications. So a large saving in time can be achieved here as this will already have most of the business and data understanding. They are well equipped at querying the data, getting to the required data quicker. The database developers are also best equipped to perform the data preparation step.
If we skip onto the deployment step. Again the database developers will be required to implement/deploy the selected data mining model in the production environment.
The two remaining steps, Modelling and Evaluation, are perhaps the two steps that database developers are less suited too. But with a bit of training on Data Mining techniques and how to evaluate data mining models, they would be well able to complete the full data mining lifecycle.
If we take the stages of CRISP-DM that a database developer is best suited to, Business Understanding, Data Understanding, Data Preparation and Deployment, this would equate to approximately 80% to 85% of the total project. With a little bit of training and up skilling, database developers are the based kind of person to perform data mining within their organisation.
I’ve recently had an article titled Oracle Data Miner Comes of Age accepted for the June edition of the UKOUG Oracle Scene article.
I’ve been thinking of ways to try to promote this article and I’ve decided I would create two videos and post them on YouTube.
The first video is a short 1 minute introduction to the article. A taster kind of video. I’ve learned from my initial attempts at producing the video that
- It is more difficult than it looks
- The camera on my laptop is not install straight. That is why I’m looking to one side
- I need a better quality microphone
But perhaps the most interesting thing was that within a couple of hours of posting it up on YouTube (and not telling anyone about it), it was found and tweeted by Charlie Burger. Charlie is the Senior Director in charge of the Oracle Data Miner tool. He also very kindly tweeted about one of my blog postings on the New Features of Oracle Data Miner 11g R2.
You can find the introduction video to the article at
I will be posting an much long view, which will be based on the full article over the next couple of weeks
Distinct Partners are a new opening for a ETL/Data Warehouse Consultant.
Following a period of growth, we are now looking for experienced ETL professionals to join our consultancy team. If you are looking for a challenge in management consultancy and believe that you have the qualities to succeed in a dynamic and high growth consultancy environment, then we would love to hear from you.
- Data Integration skills (at least one of the following)
- Proprietary: Informatica, SAS Data Integration Studio, IBM Data Stage, Oracle.
- Open Source: Talend, Postgres, My SQL, CUDA, Python.
- Strong querying , data analysis and data flow mapping skills (must)
- Data quality skills – checks, standardisation, house holding etc (understanding)
- Data architecture skills (understanding)
- data modelling (normalisation, referential integrity etc)
- dimension modelling (dimensions, facts, SCDs etc
- XML scripting and open source data integration skills (strong plus)
- Database/ETL performance tuning and programming skills
Full details can be found at
If you would like to apply for the job you can email your CV to
Gina Cassidy email@example.com
and mention you heard about the job from me
Over the past few years I have been contributing on Data Mining and Oracle Data Miner topics on the BI-Quotient blog
Over the past few months I have decided to expand my blog postings to include all the things I’m currently doing or things that I find interesting. The main theme will be ‘Data is King’
The new blog will include posts on the following topics:
- Oracle Data Miner
- Data Mining
- Data Management
- My research
- Database Design
- and generally anything else that I find interesting and relating to Data.
This is where this blog come into its own. This will be my main blog going forward. It will contain all my posts, including a copy of these that I post on the BI-Quotient blog
Today I got a phone call from Jennifer from the UKOUG office asking me would I be interested in helping out with some (minor) editing of 4 articles for the June edition of Oracle Scene.
I will also have an article in this edition of Oracle Scene (a 5 page spread).
I’ve had a quick look through the 4 articles and they are an interesting bunch of articles.
Oracle Scene will be holding elections over the coming months for a more longer term deputy editor. This will go out to the user community for a public vote. I might put might name forward for this.
Yesterday I received an email telling me that my presentation submission for VirtaThon (Virtual Conference for the Oracle, Java & MySQL Communities).
The presentation is titled, Getting Started with Oracle Data Miner 11G R2.
I would really like to give an online demo of the tools or even to be able to show a view of the demo, but it looks like I may have to do it with good old Powerpoint.