SQL Developer

How to speed up your Oracle Data Mining with in-memory and parallel

Posted on Updated on

Have you have found running a workflow in Oracle Data Miner slow or running the scripts in the database slow ?

No. Good, because I haven’t found it slow.

But (there is always a but) it really depends on the volume of data your are dealing with. For the vast majority of us who aren’t of the size of google, amazon, etc have data volumes that are not that large really and a basic server can process many millions of records extremely quickly using Oracle Data Mining.

But what if we have a large volume of data. In one recent project I had a data set containing over 3.5 billion records. Now that is big data. All of this data sitting in an Oracle Database.

So how can we process over 3.5 billion records in a couple of seconds, building 4 machine learning models in that time? Is that really possible with just using an Oracle Database? Yes is the answer and very easily. (Surely I needed Hadoop and Spark to process this data? Nope!)

The Oracle Data Miner (ODMr) tool comes with a new feature in SQL Developer 4 (and higer) that allows you to manage using Parallel execution and the in-memory DB features. These can be accessed on the ODMr Worksheet tool bar.

NewImage

The best time to look at these setting is when you have created your workflow and are ready to run it for the first time. When you click on the ‘Performance Options’ link, you will get the following window. It will display the list of nodes you have in the workflow and will then indicate if the Degree of Parallel and the In-Memory options can be set for each of the nodes.

NewImage

The default values are shown and you can changes these. For example, in a lot of scenarios you might prefer to leave the Degree of Parallel as System Determined. This will then use whatever the the default is for the database and controlled by the DBA, but if you want to specify a particular value then you can, for example setting the degree of parallel to 4 for the ‘Class Build’ node, in the above image. Similarly for the in-memory option, this will only be available for nodes where the in-memory option would be applicable. This will be where there is a lot of data processing (preparing data, transforming data, performing specific statistics, etc) and for storing any data that is generated by Oracle Data Mining.

But what if you want to change the default values. You can change these at a global level within the SQL Developer Preferences. Here you can set the default to be used for each of the different types of Oracle Data Mining nodes.

NewImage

I mentioned at the start that I’ve been able to build 4 machine learning models using Oracle Data Mining on a data set of over 3.5 billion records, all in a couple of seconds. In my scenario Parallel was set to 16 and we didn’t use in-memory as we didn’t have the licence for it. You can see that machine learning at lighting speed (ish) is possible. This timing is only for building the models, which is the step that consumes the most about of resources and time. When it comes to scoring the data, that is lighting fast. In may scenario, scoring over 300,000 was less than a second, and I didn’t use parallel or anything else to speed things up. Because we didn’t need to.

Go give it a try!

Advertisements

Scheduling ODMr Workflows in SQL Developer 4.2+

Posted on Updated on

A new feature for Oracle Data Mining (ODM) (part of SQL Developer 4.2) is the ability to schedule an ODM workflow to run a defined time or frequency.

This blog post will bring you through the steps need to schedule an ODM workflow using this new feature.

The first thing that you need is an ODMr workflow. The following image is a familiar looking one that I typically use to get a very quick demo of how easy it is to build a machine learning workflow.

NewImage

Just above the workflow worksheet we have a row of icon buttons. In the above image one of these is highlighted by a red box. This is the workflow scheduler. So go ahead on click on it.

NewImage

In most cases you will want to run the entire workflow. The default option presented to is ‘All Nodes’. If you would only like a subset of the nodes to run, you can click-on or select the node in the workflow and then click on the scheduler icon. In our example we are going to run the entire workflow, so select ‘All Nodes’ from the menu.

NewImage

The main scheduler window will open. Here you can set the Start Date and time of the first run, what the Repeat frequency is (none, every day, every week or custom) and to End the Repeat (Never, After, On Date). To schedule a once off run of the workflow just set the Date and Time, set the Repeat to ‘None’ and End Repeat should disappear in this instance. If Repeat was set to another value then you can set a value for End Repeat.

Go ahead and run the scheduler by clicking on the OK button.

NewImage

A Scheduled Jobs window should open that will display the details of the scheduled job. When this job is run in the database, this will be shown in the Workflow Jobs window. Here you can see and monitor the progress of the of the workflow.

NewImage

and that’s it. Nice an simple.

But there is a something you needed to be WARNED about. When you schedule a workflow, Oracle Data Miner will lock the workflow. This is to ensure that no changes can be made to the scheduled workflow. This is indicated with the Locked button appearing on the icon menu. If you click on this button to unlock the workflow, it will also cancel your scheduled jobs associated with this workflow.

NewImage

Also when the scheduled workflow is finished, the workflow will remain locked. So you will have to click on this Locked button to unlock the workflow.

There are a few additional advanced features. These can be found by clicking on the ‘Advanced…’ button in the main scheduler window. The first table displayed allows you to specify if you want an email sent for the different stages of the scheduled job. The second tab allows you to set the Job Priority, Max Failures, Max Run Duration and Schedule Limits.

NewImage

Oracle Data Miner (ODM 4.1) New Features

Posted on

With the release of SQL Developer 4.1 we also get a number of new features with Oracle Data Miner (ODMr). These include:

  • Data Source node can now include data sources that contain JSON data, generating JSON schema and has a JSON viewer
  • Create Table can now create data in JSON
  • JSON Query Node allows you to view, query and process JSON data, combine it with relational data, generate sub-group by, and nested columns to be part of input to algorithms
  • New PL/SQL APIs for managing Data Miner projects and workflows. This includes run, cancel, rename, delete, import and export of workflows using PL/SQL.
  • New ODMr Repository views that allows us to query and monitor our workflows.
  • Transformation Node now allows you different ways of handling NULLS.
  • Transformation Node now allows us to create Custom Bins, define bin labels and bin values
  • Overall Workflow and ODMr environment improvements to allow for greater efficiency in workflow behaviour and interactions with the database. So using ODMr should feel quicker and more responsive.

What out for the Gotchas: Although support for JSON has been added to ODMr, as outlined above, you are still a bit limited to what else you can do with your JSON data. Based on the documentation you can use JSON data in the Association and Classification build nodes.

I’m not sure about the other nodes and this will need a bit of investigation to see what nodes can and cannot use JSON data. I’m sure this will all be sorted out in the next release.

Keep an eye out for some blog posts over the coming weeks on how to explore and use these new features of Oracle Data Miner.

RIP SQL*Plus & hello SQL Command Line

Posted on

Over the past couple of months Oracle has been releasing some EA (Early Adopter) versions of a new tool that is currently called SQL Command Line.

The team behind this new tool is the SQL Developer development team and they have been working on creating a new command line SQL tool that is based on some of the technology that is included in SQL Developer.

SQL Command Line in an stand alone tool and all you need to do is to download and un-zip the tile.

What I want to show in this blog post is some of new features that are available and that I have found particularly useful. But before we get onto those commands let us first have a look at how you can get setup and running with SQL Command Line.

Download & Setup

The current download of SQL Command Line can be found under the SQL Developer 4.1 EA Download page. I’m assuming when 4.1 is formally released the download for SQL Command line will be on the main SQL Developer Download web page.

SQL CL 1

After you have downloaded the file, all you need to do is to unzip the file and then copy the unzipped directory to where you want the software to be located on your client.

Now you are ready to get started with using SQL Command Line.

Connecting to your Oracle Schema

(That) Jeff Smith and Barry McGillin have a couple of good blog posts on the different connection methods and some setup or configuration you might need to consider. Check out these links for more details.

For me I did not have to do any additional setup or configuration. I was able to use the TNS Names and the EZConnect methods without any problems.

The following how to connect to my (DMUSER) schema using the EZConnect method. With this method we pass in the username, password, the host name, port number and the service name. Just like this

> sql dmuser/dmuser@localhost:1521/pdb12c

We can not have a look at the JDBC connection details.

SQL> show jdbc

— Database Info —

Database Product Name: Oracle

Database Product Version: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production

With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

Database Major Version: 12

Database Minor Version: 1

— Driver Info —

Driver Name: Oracle JDBC driver

Driver Version: 12.1.0.2.0

Driver Major Version: 12

Driver Minor Version: 1

Driver URL: jdbc:oracle:thin:@localhost:1521/pdb12c

SQL>

If we have a TNSNAMES.ORA file on our computer and the directory that it is in, is on the search PATH, then we can use the service names defined in the TNSNAMES.ORA file. The following example shows you how to use this in two ways. The first shows how to enter all the details when you are starting SQL CL and the other is when SQL CL prompts you for each parameter.

> sql dmuser/dmuser@pdb12c

and when we are prompted to enter the parameters, we get the following.

> sql

SQLcl: Release 4.1.0 Beta on Thu Mar 05 15:16:12 2015

Copyright (c) 1982, 2015, Oracle. All rights reserved.

SQLcl: Release 4.1.0 Beta on Thu Mar 05 15:16:14 2015

Copyright (c) 1982, 2015, Oracle. All rights reserved.

Username? (”?) dmuser

Password? (**********?) ******

Database? (”?) pdb12c

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production

SQL>

As you can see these work in the same way as when we use SQL*Plus.

Now that you are connected to your schema, what else can you do? The following sections are some useful commands.

Commands & Help

The following list of commands is by no means a complete list of commands available in SQL Command Line. Theoretically everything you can currently do in SQL*Plus you can also do in SQL Command Line (theoretically) But the commands I give examples of below are some of my favourites (so far).

You can get the list of commands by typing help at the SQL prompt.

SQL> help

Then to get help on a specific command you can just add the command after the help.

SQL> help cd

CD

Changes path to look for script at after startup.

(show SQLPATH shows the full search path currently:

– CD current directory setting set by last cd command

– baseURL (url for subscripts)

– topURL (top most url when starting script)

– Last Node opened (i.e. file in worksheet)

– Where last script started

– Last opened on sqlplus path related file chooser

– SQLPATH setting

– “.” if in SQLDeveloper UI (included in SQLPATH in command line (sdsql))

).

SQL>

Some work is still needed on the help documentation and what is listed for each command, as the current version is missing some important details.

Alais

This is by far my favourite new feature. This allows us to take some of our most common SQL statements and to create a shortcut for it.

Very soon I will not be using Oracle SQL but I will be using My SQL, as I will have created my own personalised version of SQL.

To list what aliases you have defined in your schema you can type

SQL > alais

Oracle will have a few aliases already defined in SQL CL. By having a look at some of these you can see some of what you want they can do and get ideas for what you might want to do with them. To list the contents of an alias, you can use the following command.

alias list {alias name}

for example

SQL > alias list tables

This command lists the query that is used for the ‘tables’ alias that comes with SQL CL.

I use Oracle Data Miner a lot and when you use this tool it can create a number of tables with a variety of names in your schema. Most of these you will never need to look at. So what I do is create an alias that excludes these from the list of tables in my schema.

SQL> alias tables2=select table_name from user_tables where table_name not like ‘ODMR$%’ and table_name not like ‘DM$%’ and table_name not like ‘SYS_IOT%’;

So now all I need to do to list my important data only tables (and exclude all the Oracle Data Miner tables) I can run my alias ‘table2’.

SQL> tables2

You will quickly build up a suite of commands using aliases.

info and >info+

info and info+ are the new commands to replace the DESC command.

The difference between info and info+ is that info+ gives you some statistical information about the table and the attributes in the table. This is illustrated in the following examples.

Example using ‘info’

Sqlcl 2

Example using ‘info+’

Sqlcl 3

CTAS & DDL

If you want to get the DDL script to create a copy of a table you have two options open to you. The first of these is the DDL command. This creates a DDL statement based on the meta data for the table, just like in the following

Sqlcl 4

An alternative to this is to use the CTAS command that will give a slightly different output to DDL command. With the CTAS we also get the CREATE TABLE .. AS SELECT …

History

In SQL*Plus we had a limited scroll through our previous commands. The same kind of scrolling is available in SQL CL, but we can get to see all our previous commands using the ‘history’ command. The following illustrates how you can list all you previous commands, I’m sure it is limited to a certain number or will be otherwise it will become a very long, long list.

SQL> history

To find out how often each command has been run you can run

SQL> history usage

and to find out how long the query took to run the last time it was run

SQL> history time

There are lots more that I could show, but this post is way, way to long as it is. What I suggest you do is go and download SQL CL (Command Line) and start using it today.

ODMr : Graph Node: Zooming in on Graphs

Posted on

When Oracle Data Miner (ODMr) 4.0 (which is part of SQL Developer) came out back in late 2013 there was a number of new features added to the tool. One of these was a Graph node that allows us to create various graphs and charts that include Line, Scatter, Bar, Histogram and Box plot.

I’ve been using this node recently to produce graphs and particularly scatter plots. I’ve been using the scatter plots to graph the Actual values in a data set against the Predicted values that was generated by ODMr. In this scenario I had a separate data set for training my ODM data mining models and another testing data set for, well testing how well the model performed against an unseen data set.

In general the graphs produced by the Graph node look good and gives you the information that you need. But what I found was that as you increased the size of the data set, the scatter plot can look a messy. This was in part due to the size of the square used to represent a data point. As the volume of data increased then your scatter plot could just look like a coloured in area of blue squares. This is illustrated in the following image.

Graph node 1

What I discovered today is that you can zoom in on this graph to explore different regions and data point on it. This do this you need to select an data that is within the x-axis and y-axis area. When you do this you will see a box form on your graph that selects the area that you indicate by moving your mouse. After you have finished selecting the area, the Graph Node will zooms into this part of the graph and shows the data points. For example if I select the area from about 1000 on the x-axis and 1000 on the y-axis, I will get the following.

Graph node 2

Again if I select a similar are area of 350 on the x-axis and 400 on the y-axis I get the following zoomed area.

Graph node 3

You can keep zooming in on various areas.

At some point you will have finished zooming in and you will want to return to the original graph. To zoom back outward all you need to do in the graph is to click on it. When you do this you will go back to the previous step or image of the graph. You can keep doing this until you get back to the original graph. Alternatively you can zoom in and out on various parts of the graph.

Hopefully you will find this feature useful.

Issues with using latest release of ODM

Posted on

The title of this blog post makes it sound more dramatic than it actually is.

The reason for this blog post is down to me receiving a recent comment on the blog, plus having received numerous emails and a recent OTN Discussion Forum topic for Oracle Data Mining.

The main thing that they have in common is that if I use the latest version of Oracle Data Mining (ODM) it tells me that I need to upgrade my ODM Repository. What impact will this have?

The ODM Repository stores lots of information about the workflows you create using the (free) Oracle Data Mining tool that comes as part of SQL Developer. Yes you do have to pay for the OAA option, so is it really free? Well some part are like the explore node and the graph node.

If you download and want to use the latest version of the ODM tool or you want to try it out before rolling it out to others then you will need to upgrade your ODM repository.

And this the problem that people are facing.

If you upgrade then the ODM Repository it is updated to work with the latest version of the ODM tool. But what happens to everyone else who is using the previous release of the tool? The answer to that is they can no longer use ODM against their database.

Why is that? Well the version of the tool is tied to a version of the Repository. If you upgrade to the newer tool and repository then your older versions of the ODM tool no longer work.

The result of all of this is that you cannot have a mixture of versions of the ODM tool (SQL Developer) being used in your team/company.

There is a very simple solution to all of this. Everyone uses the same version of the ODM tool (i.e. the same version of SQL Developer). For example your team might be using SQL Dev 4 that was released last December. But in early March there was a new patch release 4.1. In order to use this new version of the tool all of your team needs to start using it at the same time. The first person to use it will be prompted to migrate the ODM repository. This is automatically done once you enter the password for SYS.

But in some teams this is not possible to do, you want to try out the tool to see that it works correctly before getting others to use it. The way around this is to have a separate database and use it for your testing. You can easily copy across your workflows and ODM objects to the test database.

This might not be possible for everyone, so what can you do. Create a Virtual Machine and try it out on your own desktop is one way.

The answer to this problem is not ideal, but hopefully you have a better idea of why things are happening this way and what you can or cannot do about it.

Like I said at the topic of this blog post that the title is a bit more dramatic than is really the case 🙂

My next blog post will be on another question I’ve been asked a few times and this is ‘When I go to use the ODM tool it tells me that the Oracle Text feature of Oracle needs to be enabled’

ODM Graph Node new feature

Posted on

With the recent release of SQL Developer 4.0.1 there has been some very minor bug fixes for Oracle Data Miner. But there has been one particular enhancement that I wanted to have a look at. This blog post will look at this new feature and how you can use it too. In the previously released version of the Oracle Data Miner tool we had a Graph Node. This is really a new feature that came with SQL Developer 4 and was available in the Early Adopter releases since July 2013. During the Early Adopter releases and with the official release of SQL Dev 4, the Line Graph feature of the Graph Node only allowed you to have one line drawn i.e. for a single Y attribute. Now in SQL Dev/ODM 4.0.1 the Graph Node has been updated to allow you to have multiple Line Graphs that are stacked. To illustrate this we will need to define what data source we are going to use and to create a shell of a Graph Node in Oracle Data Miner. The following diagram illustrates this. In this example I’m using the CARS data set that is available on OTN.

New graph

When you have these nodes created you are now ready to create your graphs. To do this double click on the Graph Node. You can now set the attribute to use for the X-axis, in my example this will be MODEL_YEAR. Then for the Y-Axis select the attributes you want to include in the stacked graph by holding down the control key as you select each attribute

New graph2

You are now ready to create the graph. To do this click on the OK button and you will have your stacked line graph.

New graph3