Oracle

Oracle Code Online December 2017

Posted on

This week Oracle Code will be having an online event consisting of 5 tracks and with 3 presentations on each track.

This online Oracle Code event will be given in 3 different geographic regions on 12th, 13th and 14th December.

NewImage

I’ve been selected to give one of these talks, and I’ve given this talk at some live Oracle Code events and at JavaOne back in October.

The present is pre-recorded and I recorded this video back in September.

I hope to be online at the end of some of these presentations to answer any questions, but unfortunately due to changes with my work commitments I may not be able to be online for all of them.

The moderator for these events will take your questions (or you can send them to me here) and I will write a blog post answering all your questions.

Advertisements

My Oracle Open World 2017 Presentations

Posted on Updated on

Oracle Open World 2017 will be happening very soon (1st-5th October). Still lots to do before I can get on that plane to San Francisco.

This year I’ll be giving 2 presentations (see table below). One on the Sunday during the User Groups Sunday sessions. I’ve been accepted on the EMEA track. I then get a few days off to enjoy and experience OOW until Thursday when I have my second presentation that is part of JavaOne (I think!)

My OOW kicks off on Friday 29th September with the ACE Director briefing at Oracle HQ, after flying to SFO on Thursday 28th. This year it is only for one day instead of two days. I really enjoy this event as we get to learn and see what Oracle will be announcing at OOW as well as some things that will be coming out during the following few months.

Day Time Presentation Location
Sunday 13:45-14:30 SQL: One Language to Rule All Your Data [OOW SUN1238]

SQL is a very powerful language that has been in use for almost 40 years. SQL comes with many powerful techniques for analyzing your data, and you can analyze data outside the database using SQL as well. Using the new Oracle Big Data SQL it is now possible to analyze data that is stored in a database, in Hadoop, and in NoSQL all at the same time. This session explores the capabilities in Oracle Database that allow you to work with all your data. Discover how SQL really is the unified language for processing all your data, allowing you to analyze, process, run machine learning, and protect all your data.

Hopefully this presentation will be a bit of Fun! For those who have been working with the database for a long time, we can sometimes forget what we can really do. For those starting out in the career may not realise what the database can do. The presentation delivers an important message while having a laugh or two (probably at me).

Marriott Marquis (Golden Gate Level) – Golden Gate C1/C2
Thursday 13:45-14:30 Is SQL the Best Language for Statistics and Machine Learning?

[OOW and JavaOne CON7350]

Did you know that Oracle Database comes with more than 300 statistical functions? And most of these statistical functions are available in all versions of Oracle Database? Most people do not seem to know this. When we hear about people performing statistical analytics, we hear them talking about Excel and R, but what if we could do statistical analysis in the database without having to extract any data onto client machines? This presentation explores the various statistical areas available in Oracle Database and gives several demonstrations. We can also greatly expand our statistical capabilities by using Oracle R Enterprise with the embedded capabilities in SQL.

This presentation is just one of the 14 presentations that are scheduled for the Thursday! I believe this session is already fully booked, but you can still add yourself to the wait list.

Marriott Marquis (Golden Gate Level) – Golden Gate B

My flights and hotel have been paid by OTN as part of the Oracle ACE Director program. Yes this costs a lot of money and there is no way I’d be able to pay these costs. Thank you.

My diary for OOW is really full. No it is completely over booked. It is just mental. Between attending conference session, meeting with various product teams (we only get to meet at OOW), attending various community meet-ups, this year I get to attend some events for OUG leaders (representing UKOUG), spending some time on the EMEA User Group booth, various meetings with people to discuss how they can help or contribute to the UKOUG, then there is Oak Table World, trying to check out the exhibition hall, spend some time at the OTN/ODC hangout area, getting a few OTN t-shirts, doing some book promotions at the Oracle Press shop, etc., etc., etc. I’m exhausted just thinking about it. Mosts days start at 7am and then finish around 10pm.

I’ll need a holiday when I get home! but it will be straight back to work 😦

If you are at OOW and want to chat then contact me via DM on Twitter or WhatsApp (these two are best) or via email (this will be the slowest way).

I’ll have another blog post listing the presentations from various people and partners from the Republic of Ireland who are speaking at OOW.

My Oracle Open World 2017 Presentations

Posted on

Oracle Open World 2017 will be happening very soon (1st-5th October). Still lots to do before I can get on that plane to San Francisco.

This year I’ll be giving 2 presentations (see table below). One on the Sunday during the User Groups Sunday sessions. I’ve been accepted on the EMEA track. I then get a few days off to enjoy and experience OOW until Thursday when I have my second presentation that is part of JavaOne (I think!)

My OOW kicks off on Friday 29th September with the ACE Director briefing at Oracle HQ, after flying to SFO on Thursday 28th. This year it is only for one day instead of two days. I really enjoy this event as we get to learn and see what Oracle will be announcing at OOW as well as some things that will be coming out during the following few months.

table.myTable { border-collapse:collapse; }
table.myTable td, table.myTable th { border:1px solid black;padding:5px; }

Day Time Presentation Location
Sunday 13:45-14:30 SQL: One Language to Rule All Your Data [OOW SUN1238]

SQL is a very powerful language that has been in use for almost 40 years. SQL comes with many powerful techniques for analyzing your data, and you can analyze data outside the database using SQL as well. Using the new Oracle Big Data SQL it is now possible to analyze data that is stored in a database, in Hadoop, and in NoSQL all at the same time. This session explores the capabilities in Oracle Database that allow you to work with all your data. Discover how SQL really is the unified language for processing all your data, allowing you to analyze, process, run machine learning, and protect all your data.

Hopefully this presentation will be a bit of Fun! For those who have been working with the database for a long time, we can sometimes forget what we can really do. For those starting out in the career may not realise what the database can do. The presentation delivers an important message while having a laugh or two (probably at me).

Marriott Marquis (Golden Gate Level) – Golden Gate C1/C2
Thursday 13:45-14:30 Is SQL the Best Language for Statistics and Machine Learning?

[OOW and JavaOne CON7350]

Did you know that Oracle Database comes with more than 300 statistical functions? And most of these statistical functions are available in all versions of Oracle Database? Most people do not seem to know this. When we hear about people performing statistical analytics, we hear them talking about Excel and R, but what if we could do statistical analysis in the database without having to extract any data onto client machines? This presentation explores the various statistical areas available in Oracle Database and gives several demonstrations. We can also greatly expand our statistical capabilities by using Oracle R Enterprise with the embedded capabilities in SQL.

This presentation is just one of the 14 presentations that are scheduled for the Thursday! I believe this session is already fully booked, but you can still add yourself to the wait list.

Marriott Marquis (Golden Gate Level) – Golden Gate C3

My flights and hotel have been paid by OTN as part of the Oracle ACE Director program. Yes this costs a lot of money and there is no way I’d be able to pay these costs. Thank you.

My diary for OOW is really full. No it is completely over booked. It is just mental. Between attending conference session, meeting with various product teams (we only get to meet at OOW), attending various community meet-ups, this year I get to attend some events for OUG leaders (representing UKOUG), spending some time on the EMEA User Group booth, various meetings with people to discuss how they can help or contribute to the UKOUG, then there is Oak Table World, trying to check out the exhibition hall, spend some time at the OTN/ODC hangout area, getting a few OTN t-shirts, doing some book promotions at the Oracle Press shop, etc., etc., etc. I’m exhausted just thinking about it. Mosts days start at 7am and then finish around 10pm.

I’ll need a holiday when I get home! but it will be straight back to work 😦

If you are at OOW and want to chat then contact me via DM on Twitter or WhatsApp (these two are best) or via email (this will be the slowest way).

I’ll have another blog post listing the presentations from various people and partners from the Republic of Ireland who are speaking at OOW.

Cluster Distance using SQL with Oracle Data Mining – Part 4

Posted on Updated on

This is the fourth and last blog post in a series that looks at how you can examine the details of predicted clusters using Oracle Data Mining. In the previous blog posts I looked at how to use CLUSER_ID, CLUSTER_PROBABILITY and CLUSTER_SET.

In this blog post we will look at CLUSTER_DISTANCE. We can use the function to determine how close a record is to the centroid of the cluster. Perhaps we can use this to determine what customers etc we might want to focus on most. The customers who are closest to the centroid are one we want to focus on first. So we can use it as a way to prioritise our workflows, particularly when it is used in combination with the value for CLUSTER_PROBABILITY.

Here is an example of using CLUSTER_DISTANCE to list all the records that belong to Cluster 14 and the results are ordered based on closeness to the centroid of this cluster.

SELECT customer_id, 
       cluster_probability(clus_km_1_37 USING *) as cluster_Prob,
       cluster_distance(clus_km_1_37 USING *) as cluster_Distance
FROM   insur_cust_ltv_sample
WHERE   cluster_id(clus_km_1_37 USING *) = 14
order by cluster_Distance asc;

Here is a subset of the results from this query.

NewImage

When you examine the results you may notice that the records that is listed first and closest record to the centre of cluster 14 has a very low probability. You need to remember that we are working in a N-dimensional space here. Although this first record is closest to the centre of cluster 14 it has a really low probability and if we examine this record in more detail we will find that it is at an overlapping point between a number of clusters.

This is why we need to use the CLUSTER_DISTANCE and CLUSTER_PROBABILITY functions together in our workflows and applications to determine how we need to process records like these.

Cluster Sets using SQL with Oracle Data Mining – Part 3

Posted on Updated on

This is the third blog post on my series on examining the Clusters that were predicted by an Oracle Data Mining model. Check out the previous blog posts.

In the previous posts we were able to list the predicted cluster for each record in our data set. This is the cluster that the records belonged to the most. I also mentioned that a record could belong to many clusters.

So how can you list all the clusters that the a record belongs to?

You can use the CLUSTER_SET SQL function. This will list the Cluster Id and a probability measure for each cluster. This function returns a array consisting of the set of all clusters that the record belongs to.

The following example illustrates how to use the CLUSTER_SET function for a particular cluster model.

SELECT t.customer_id, s.cluster_id, s.probability
FROM   (select customer_id, cluster_set(clus_km_1_37 USING *) as Cluster_Set
        from   insur_cust_ltv_sample 
        WHERE  customer_id in ('CU13386', 'CU100')) T,
      TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc; 

The output from this query will be an ordered data set based on the customer id and then the clusters listed in descending order of probability. The cluster with the highest probability is what would be returned by the CLUSTER_ID function. The output from the above query is shown below.

NewImage

If you would like to see the details of each of the clusters and to examine the differences between these clusters then you will need to use the CLUSTER_DETAILS function (see previous blog post).

You can specify topN and cutoff to limit the number of clusters returned by the function. By default, both topN and cutoff are null and all clusters are returned.

– topN is the N most probable clusters. If multiple clusters share the Nth probability, then the function chooses one of them.

– cutoff is a probability threshold. Only clusters with probability greater than or equal to cutoff are returned. To filter by cutoff only, specify NULL for topN.

You may want to use these individually or combined together if you have a large number of customers. To return up to the N most probable clusters that are greater than or equal to cutoff, specify both topN and cutoff.

The following example illustrates using the topN value to return the top 4 clusters.

SELECT t.customer_id, s.cluster_id, s.probability
FROM   (select customer_id, cluster_set(clus_km_1_37, 4, null USING *) as Cluster_Set
        from   insur_cust_ltv_sample 
        WHERE  customer_id in ('CU13386', 'CU100')) T,
      TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc;

and the output from this query shows only 4 clusters displayed for each record.

NewImage

Alternatively you can select the clusters based on a cut off value for the probability. In the following example this is set to 0.05.

SELECT t.customer_id, s.cluster_id, s.probability
FROM   (select customer_id, cluster_set(clus_km_1_37, NULL, 0.05 USING *) as Cluster_Set
        from   insur_cust_ltv_sample 
        WHERE  customer_id in ('CU13386', 'CU100')) T,
      TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc;

and the output this time looks a bit different.

NewImage

Finally, yes you can combine these two parameters to work together.

SELECT t.customer_id, s.cluster_id, s.probability
FROM (select customer_id, cluster_set(clus_km_1_37, 2, 0.05 USING *) as Cluster_Set
from insur_cust_ltv_sample
WHERE customer_id in (‘CU13386’, ‘CU100’)) T,
TABLE(T.cluster_set) S
order by t.customer_id, s.probability desc;

Examining predicted Clusters and Cluster details using SQL

Posted on

In a previous blog post I gave some details of how you can examine some of the details behind a prediction made using a classification model. This seemed to spark a lot of interest. But before I come back to looking at classification prediction details and other information, this blog post is the first in a 4 part blog post on examining the details of Clusters, as identified by a cluster model created using Oracle Data Mining.

The 4 blog posts will consist of:

  • 1 – (this blog post) will look at how to determine the predicted cluster and cluster probability for your record.
  • 2 – will show you how to examine the details behind and used to predict the cluster.
  • 3 – A record could belong to many clusters. In this blog post we will look at how you can determine what clusters a record can belong to.
  • 4 – Cluster distance is a measure of how far the record is from the cluster centroid. As a data point or record can belong to many clusters, it can be useful to know the distances as you can build logic to perform different actions based on the cluster distances and cluster probabilities.

Right. Let’s have a look at the first set of these closer functions. These are CLUSTER_ID and CLUSTER_PROBABILITY.

CLUSER_ID : Returns the number of the cluster that the record most closely belongs to. This is measured by the cluster distance to the centroid of the cluster. A data point or record can belong or be part of many clusters. So the CLUSTER_ID is the cluster number that the data point or record most closely belongs too.

CLUSTER_PROBABILITY : Is a probability measure of the likelihood of the data point or record belongs to a cluster. The cluster with the highest probability score is the cluster that is returned by the CLUSTER_ID function.

Now let us have a quick look at the SQL for these two functions. This first query returns the cluster number that each record most strong belongs too.

SELECT customer_id, 
       cluster_id(clus_km_1_37 USING *) as Cluster_Id, 
FROM   insur_cust_ltv_sample
WHERE  customer_id in ('CU13386', 'CU6607', 'CU100');

NewImage

Now let us add in the cluster probability function.

SELECT customer_id, 
       cluster_id(clus_km_1_37 USING *) as Cluster_Id,
       cluster_probability(clus_km_1_37 USING *) as cluster_Prob       
FROM   insur_cust_ltv_sample
WHERE  customer_id in ('CU13386', 'CU6607', 'CU100');

NewImage

These functions gives us some insights into what the cluster predictive model is doing. In the remaining blog posts in this series I will look at how you can delve deeper into the predictions that the cluster algorithm is make.

Oracle Data Visualization Desktop – now available

Posted on Updated on

After a bit of a long wait Oracle have finally release Oracle Data Visualization for the desktop. The desktop version of this tool is only available for Windows desktops at the moment. I’m sure Oracle will be bringing out versions of other OS soon (I hope).

To get you hands on the Oracle Data Visualization to to the following OTN webpage (click on this image)

NewImage

After downloading has finished, you can run the installer.

When the Oracle Installer opens you will be prompted to enter the required details or to accept the defaults, as outlined below.

NewImage

  • Installation Location : Decide where you are going to have the Oracle Data Visualization tool installed on your desktop. The default location is C:\Program Files\Oracle Data Visualization Desktop . Click Next
  • Options : There are 2 check boxes for ‘Create desktop shortcut’ and ‘Deploy samples’. Leave both of these checked, as you will probably want these. Click Next.
  • Summary : Lists a summary of the installation. There is nothing really for you to do here, so on the Install button.
  • NewImage

  • Progress : You can ten sit back and monitor the progress of the installation. The installation tool about 4 minutes on my small Windows VM
  • NewImage

When the installation is complete you can now fire up Oracle Data Visualization and enjoy. If you have just installed the tool it will automatically be started for you.

NewImage

When the tool has finished all the configurations that it needs to do, the tool will open with the following window and shows a sample projects for you to get an idea of some of the things that are possible.

NewImage

For more details on the tool and on the Oracle Cloud hosted version click on the following image to get to the Oracle webpage for the product.

NewImage