SQL

RandomForests in R, Python and SQL

Posted on Updated on

I recently wrote a two part article explaining how Random Forests work and how to use them in R, Python and SQL.

These were posted on ToadWorld webpages. Check them out.

Part 1 of article

https://blog.toadworld.com/2018/08/31/random-forest-machine-learning-in-r-python-and-sql-part-1

 

Part 2 of article

https://blog.toadworld.com/2018/09/01/random-forest-machine-learning-in-r-python-and-sql-part-2

Advertisements

R vs Python vs SQL for Machine Learning (Infographic)

Posted on Updated on

Next week I’ll be giving several presentation on machine learning at Oracle Open World and Oracle Code One. In one of these presentation an evaluation of using R vs Python vs SQL will be given and discussed.

Check out the infographic containing the comparisons.

Info Graphic

 

OUG Ireland 2017 Presentation

Posted on

Here are the slides from my presentation at OUG Ireland 2017. All about running R using SQL.

Presentations from OUGN17

Posted on

Here are the presentations I gave at OUG Norway last week. These are also available on SlideShare

Formatting results from ORE script in a SELECT statement

Posted on

This blog post looks at how to format the output or the returned returns from an Oracle R Enterprise (ORE), user defined R function, that is run using a SELECT statement in SQL.

Sometimes this can be a bit of a challenge to work out, but it can be relatively easy once you have figured out how to do it. The following examples works through some scenarios of different results sets from a user defined R function that is stored in the Oracle Database.

To run that user defined R function using a SELECT statement I can use one of the following ORE SQL functions.

  • rqEval
  • rqTableEval
  • rqGroupEval
  • rqRowEval

For simplicity we will just use the first of these ORE SQL functions to illustrate the problem and how to go about solving it. The rqEval ORE SQL function is a generate purpose function to call a user defined R script stored in the database. The function does not require any input data set and but it will return some data. You could use this to generate some dummy/test data or to find some information in the database. Here is noddy example that returns my name.

BEGIN
   --sys.rqScriptDrop('GET_NAME');
   sys.rqScriptCreate('GET_NAME',
      'function() {
         res<-data.frame("Brendan")
         res
         } ');
END;

To call this user defined R function I can use the following SQL.

select *
from table(rqEval(null,
                  'select cast(''a'' as varchar2(50))  from dual',
                  'GET_NAME') );  

For text strings returned you need to cast the returned value giving a size.

If we have a numeric value being returned we can don’t have to use the cast and instead use ‘1’ as shown in the following example. This second example extends our user defined R function to return my name and a number.

BEGIN
   sys.rqScriptDrop('GET_NAME');
   sys.rqScriptCreate('GET_NAME',
      'function() {
         res<-data.frame(NAME="Brendan", YEAR=2017)
         res
         } ');
END;

To call the updated GET_NAME function we now have to process two returned columns. The first is the character string and the second is a numeric.

select *
from table(rqEval(null,
                  'select cast(''a'' as varchar2(50)) as "NAME", 1 AS YEAR  from dual',
                  'GET_NAME') );                  

These example illustrate how you can process character strings and numerics being returned by the user defined R script.

The key to setting up the format of the returned values is knowing the structure of the data frame being returned by the user defined R script. Once you know that the rest is (in theory) easy.

Explicit Semantic Analysis setup using SQL and PL/SQL

Posted on Updated on

In my previous blog post I introduced the new Explicit Semantic Analysis (ESA) algorithm and gave an example of how you can build an ESA model and use it. Check out this link for that blog post.

In this blog post I will show you how you can manually create an ESA model. The reason that I’m showing you this way is that the workflow (in ODMr and it’s scheduler) may not be for everyone. You may want to automate the creation or recreation of the ESA model from time to time based on certain business requirements.

In my previous blog post I showed how you can setup a training data set. This comes with ODMr 4.2 but you may need to expand this data set or to use an alternative data set that is more in keeping with your domain.

Setup the ODM Settings table

As with all ODM algorithms we need to create a settings table. This settings table allows us to store the various parameters and their values, that will be used by the algorithm.

-- Create the settings table
CREATE TABLE ESA_settings (
    setting_name VARCHAR2(30),
    setting_value VARCHAR2(30));

-- Populate the settings table
-- Specify ESA. By default, Naive Bayes is used for classification.
-- Specify ADP. By default, ADP is not used. Need to turn this on.
BEGIN
    INSERT INTO ESA_settings (setting_name, setting_value)
    VALUES (dbms_data_mining.algo_name,       
           dbms_data_mining.algo_explicit_semantic_analys);
   
    INSERT INTO ESA_settings (setting_name, setting_value)
    VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on);
  
    INSERT INTO ESA_settings (setting_name, setting_value)
    VALUES (odms_sampling,odms_sampling_disable);
  
    commit;
END; 

These are the minimum number of parameter setting needed to run the ESA algorithm. The other ESA algorithm setting include:

NewImage

Setup the Oracle Text Policy

You also need to setup an Oracle Text Policy and a lexer for the Stopwords.

DECLARE
   v_policy_name  varchar2(30);
   v_lexer_name   varchar2(3)
BEGIN
    v_policy_name  := 'ESA_TEXT_POLICY';
    v_lexer_name   := 'ESA_LEXER';
    ctx_ddl.create_preference(v_lexer_name, 'BASIC_LEXER');
    v_stoplist_name := 'CTXSYS.DEFAULT_STOPLIST';  -- default stop list
    ctx_ddl.create_policy(policy_name => v_policy_name, lexer => v_lexer_name, stoplist => v_stoplist_name);
END;

Create the ESA model

Once we have the settings table created with the parameter values set for the algorithm and the Oracle Text policy created, we can now create the model.

To ensure that the Oracle Text Policy is applied to the text we want to analyse we need to create a transformation list and add the Text Policy to it.

We can then pass the text transformation list as a parameter to the CREATE_MODEL, procedure.

DECLARE
   v_xlst              dbms_data_mining_transform.TRANSFORM_LIST;
   v_policy_name       VARCHAR2(130) := 'ESA_TEXT_POLICY';
   v_model_name        varchar2(50) := 'ESA_MODEL_DEMO_2';
BEGIN
   v_xlst := dbms_data_mining_transform.TRANSFORM_LIST();
   DBMS_DATA_MINING_TRANSFORM.SET_TRANSFORM(v_xlst, '"TEXT"', NULL, '"TEXT"', '"TEXT"', 'TEXT(POLICY_NAME:'||v_policy_name||')(MAX_FEATURES:3000)(MIN_DOCUMENTS:1)(TOKEN_TYPE:NORMAL)');

    DBMS_DATA_MINING.DROP_MODEL(v_model_name, TRUE);
    DBMS_DATA_MINING.CREATE_MODEL(
        model_name          => v_model_name,
        mining_function     => DBMS_DATA_MINING.FEATURE_EXTRACTION,
        data_table_name     => 'WIKISAMPLE',
        case_id_column_name => 'TITLE',
        target_column_name  => NULL,
        settings_table_name => 'ESA_SETTINGS',
        xform_list          => v_xlst);
END;

NOTE: Yes we could have merged all of the above code into one PL/SQL block.

Use the ESA model

We can now use the FEATURE_COMPARE function to use the model we just created, just like I did in my previous blog post.

SELECT FEATURE_COMPARE(ESA_MODEL_DEMO_2
               USING 'Oracle Database is the best available for managing your data' text 
               AND USING 'The SQL language is the one language that all databases have in common' text) similarity 
FROM DUAL;

Go give the ESA algorithm a go and see where you could apply it within your applications.

Auditing Oracle Data Mining model usage

Posted on Updated on

In a previous blog post I talked about how you can rename and comment your Oracle Data Mining models. This is to allow you to easily to see and understand the intended use of the data mining model.

Another feature available to you is to audit the usage of the the data mining models. As your data mining environment grows to many 10s or more typically 100s of models, you will need to have some way of tracking their usage. This can allow you to discover what models are frequently being used and those that are not being used in-frequently. You can then use this information to investigate if there are any issues. Or in some companies I’ve seen an internal charging scheme in place for each time the models are used.

The following outlines the steps required to setup the auditing of your models and how to inspect the usage.

Note: You will need to the AUDIT_ADMIN role to audit the models.

First create an audit policy for the data mining model in a particular schema.

CREATE AUDIT POLICY oaa_odm_audit_usage 
ACTIONS ALL 
ON MINING MODEL dmuser.high_value_churn_clas_svm;

This creates a policy that monitors all activity on the data mining model HIGH_VALUE_CHURN_CLAS_SVM in the DMUSER schema.

Now we need to enable the policy and allow to to tract all activity on the model.

AUDIT POLICY oaa_odm_audit_usage BY oaa_model_user;

This will track all usage of the data mining model by the schema call OAA_MODEL_USER. We can then use the following query to search for the audit records for the OAA_MODEL_USER schema.

SELECT dbusername,
       action_name, 
       systemm_privilege_used, 
       return_code,
       object_schema, 
       object_name, 
       sql_text
FROM  unified_audit_trail
WHERE object_name = 'HIGH_VALUE_CHURN_CLAS_SVM';

But there is a little problem with using what I’ve just shown you above. The problem is that it will track all activity on the data mining model. Perhaps this isn’t what we really want. Perhaps we only want to track only certain activity of the data mining model. Instead of creating the policy using ‘ACTIONS ALL’, we can list out the actions or operations we want to track. For example, we want to tract when it is used in a SELECT. The following shows how you can set this up for just SELECT.

CREATE AUDIT POLICY oaa_odm_audit_select 
ACTIONS SELECT 
ON MINING MODEL dmuser.high_value_churn_clas_svm;

AUDIT POLICY oaa_odm_audit_select BY oaa_model_user;

The list of individual audit events you can use include:

  • AUDIT
  • COMMENT
  • GRANT
  • RENAME
  • SELECT

A policy can be setup to tract one or more of these events. For example, if we wanted a policy to track SELECT and GRANT, we would have list each event separated by a comma.

CREATE AUDIT POLICY oaa_odm_audit_select_grant 
ACTIONS SELECT 
ON MINING MODEL dmuser.high_value_churn_clas_svm,
ACTIONS GRANT 
ON MINING MODEL dmuser.high_value_churn_clas_svm,
;

AUDIT POLICY oaa_odm_audit_select_grant BY oaa_model_user;