Part 2 – Do I have permissions to use the data for data profiling?

Posted on Updated on

This is the second part of series of blog posts on ‘How the EU GDPR will affect the use of Machine Learning

I have the data, so I can use it? Right?

I can do what I want with that data? Right? (sure the customer won’t know!)

NO. The answer is No you cannot use the data unless you have been given the permission to use it for a particular task.

The GDPR applies to all companies worldwide that process personal data of European Union (EU) citizens. This means that any company that works with information relating to EU citizens will have to comply with the requirements of the GDPR, making it the first global data protection law.

NewImage

The GDPR tightens the rules for obtaining valid consent to using personal information. Having the ability to prove valid consent for using personal information is likely to be one of the biggest challenges presented by the GDPR. Organisations need to ensure they use simple language when asking for consent to collect personal data, they need to be clear about how they will use the information, and they need to understand that silence or inactivity no longer constitutes consent.

NewImage

You will need to investigate the small print of all the terms and conditions that your customers have signed. Then you need to examine what data you have, how and where it was collected or generated, and then determine if I have to use this data beyond what the original intention was. If there has been no mention of using the customer data (or any part of it) for analytics, profiling, or anything vaguely related to it then you cannot use the data. This could mean that you cannot use any data for your analytics and/or machine learning. This is a major problem. No data means no analytics and no targeting the customers with special offers, etc.

NewImage

Data cannot be magically produced out of nowhere and it isn’t the fault of the data science team if they have no data to use.

How can you over come this major stumbling block?

The first place is to review all the T&Cs. Identify what data can be used and what data cannot be used. One approach for data that cannot be used is to update the T&Cs and get the customers to agree to them. Yes they need to explicitly agree (or not) to them. Giving them a time limit to respond is not allowed. It needs to be explicit.

NewImage

Yes this will be hard work. Yes this will take time. Yes it will affect what machine learning and analytics you can perform for some time. But the sooner you can identify these area, get the T&Cs updated, get the approval of the customers, the sooner the better and ideally all of this should be done way in advance on 25th May, 2018.

NewImage

In the next blog post I will look at addressing Discrimination in the data and in the machine learning models.

Click back to ‘How the EU GDPR will affect the use of Machine Learning – Part 1‘ for links to all the blog posts in this series.

How the EU GDPR will affect the use of Machine Learning – Part 1

Posted on

On 5 December 2015, the European Parliament, the Council and the Commission reached agreement on the new data protection rules, establishing a modern and harmonised data protection framework across the EU. Then on 14th April 2016 the Regulations and Directives were adopted by the European Parliament.

NewImage

The EU GDPR comes into effect on the 25th May, 2018.

Are you ready ?

The EU GDPR will affect every country around the World. As long as you capture and use/analyse data captured with the EU or by citizens in the EU then you have to comply with the EU GDPR.

Over the past few months we have seen a increase in the amount of blog posts, articles, presentations, conferences, seminars, etc being produced on how the EU GDPR will affect you. Basically if your company has not been working on implementing processes, procedures and ensuring they comply with the regulations then you a bit behind and a lot of work is ahead of you.

Like I said there was been a lot published and being talked about regarding the EU GDPR. Most of this is about the core aspects of the regulations on protecting and securing your data. But very little if anything is being discussed regarding the use of machine learning and customer profiling.

Do you use machine learning to profile, analyse and predict customers? Then the EU GDPRs affect you.

Article 22 of the EU GDPRs outlines some basic capabilities regarding machine learning, and in additionally Articles 13, 14, 19 and 21.

Over the coming weeks I will have the following blog posts. Each of these address a separate issue, within the EU GDPR, relating to the use of machine learning.

  • Part 2 – Do I have permissions to use the data for data profiling?
  • Part 3 – Ensuring there is no Discrimination in the Data and machine learning models.
  • Part 4 – (Article 22: Profiling) Why me? and how Oracle 12c saves the day

NewImage

Installing Scala and Apache Spark on a Mac

Posted on

The following outlines the steps I’ve followed to get get Scala and Apache Spark installed on my Mac. This allows me to play with Apache Spark on my laptop (single node) before deploying my code to a multi-node cluster.

1. Install Homebrew

Homebrew seems to be the standard for installing anything on a Mac. To install Homebrew run

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

NewImage

When prompted enter your system/OS password to allow the install to proceed.

NewImage

NewImage

2. Install xcode-select (if needed)

You may have xcode-select already installed. This tool allows you to install the languages using command line.

xcode-select --install

If it already installed then nothing will happen and you will get the following message.

xcode-select: error: command line tools are already installed, use "Software Update" to install updates

3. Install Scala

[If you haven’t installed Java then you need to also do this.]

Use Homebrew to install scala.

brew install scala

NewImage

4. Install Apache Spark

Now to install Apache Spark.

brew install apache-spark

NewImage

5. Start Spark

Now you can start the Apache Spark shell.

spark-shell

NewImage

6. Hello-World and Reading a file

The traditional Hello-World example.

scala> val helloWorld = "Hello-World"
helloWorld: String = Hello-World

or

scala> println("Hello World")
Hello World

What is my current working directory.

scala> val whereami = System.getProperty("user.dir")
whereami: String = /Users/brendan.tierney

Read and process a file.

scala> val lines = sc.textFile("docker_ora_db.txt")
lines: org.apache.spark.rdd.RDD[String] = docker_ora_db.txt MapPartitionsRDD[3] at textFile at :24

scala> lines.count()
res6: Long = 36

scala> lines.foreach(println)
####################################################################
## Specify the basic DB parameters
## Copyright(c) Oracle Corporation 1998,2016. All rights reserved.##
##                                                                ##
##------------------------------------------------------------------
##                   Docker OL7 db12c dat file                    ##

##                                                                ##
## db sid (name)
####################################################################
## default : ORCL

## cannot be longer than 8 characters
##------------------------------------------------------------------

...

There will be a lot more on how to use Spark and how to use Spark with Oracle (all their big data stuff) over the coming months.

[I’ve been busy for the past few months working on this stuff, EU GDPR issues relating to machine learning, and other things. I’ll be sharing some what I’ve been working on and learning in blog posts over the coming weeks]

Slides from the Ireland OUG Meetup May 2017

Posted on

Here are some of the slides from our meetup on 11th May 2017.

The remaining slides will be added when they are available.

OUG Ireland Meetup 11th May

Posted on Updated on

The next OUG Ireland Meetup is happening on 11th May, in the Bank of Ireland Grand Canal Dock. This is a free event and is open to every one. You don’t have to be a member to attend.

Following on from a very successful 2 day OUG Ireland Conference with over 250 attendees, we have organised our next meetup. This was mentioned during the opening session of the conference.

NewImage

We typically have 2 presentations at each Meetup and on 11th May we have:

1. Oracle Analytics Cloud Service.

Oralce Analytics Cloud Service was only released a few weeks ago and we some local people who have been working with the beta and early adopter releases. They will be giving us some insights on this new product and how it compares with other analytics products like Oracle Data Visualization and OBIEE.

Running Oracle DataGuard on RAC on Oracle 12c

The second presentation will be on using Oracle DataGuard on RAC on Oracle 12c. We have a very experienced DBA talking about his experiences of using these products how to workaround some key bugs and situations to be aware of for administration purposes. Lots of valuable information to be gained.

Check out the full agenda and to register for the Meetup by clicking on this link or on the Meetup image above.

There will be some food and refreshments available for you to enjoy.

The Meetup will be in Bank of Ireland, Grand Canal Dock. This venue is a very popular locations for Meetups in Dublin.

NewImage

Setting up Oracle Database on Docker

Posted on

A couple of days ago it was announced that several Oracle images were available on the Docker Store.

This is by far the easiest Oracle Database install I have every done !

You simply have no excuse now for not installing and using an Oracle Database. Just go and do it now!

The following steps outlines what I did you get an Oracle 12.1c Database.

1. Download and Install Docker

There isn’t much to say here. Just go to the Docker website, select the version docker for your OS, and just install it.

You will probably need to create an account with Docker.

NewImage

After Docker is installed it will automatically start and and will be placed in your system tray etc so that it will automatically start each time you restart your laptop/PC.

2. Adjust the memory allocation

From the system tray open the Docker application. In the Advanced section allocate a bit more memory. This will just make things run a bit smoother. Be a bit careful on how much to allocate.

NewImage

In the General section check the tick-box for automatically backing up Docker VMs. This is assuming you have back-ups setup, for example with Time Machine or something similar.

3. Download & Edit the Oracle Docker environment File

On the Oracle Database download Docker webpage, click on the the Get Content button.

NewImage

You will have to enter some details like your name, company, job title and phone number, then click on the check-box, before clicking on the Get Content button. All of this is necessary for the Oracle License agreement.

The next screen lists the Docker Services and Partner Services that you have signed up for.

NewImage

Click on the Setup button to go to the webpage that contains some of the setup instructions.

NewImage

The first thing you need to do is to copy the sample Environment File. Create a new file on your laptop/desktop and paste the environment file contents into the file. There are a few edits you need to make to this file. The following is the edited/modified Environment file that I created and used. The changes are for DB_SID, DB_PASSWD and DB_DOMAIN.

####################################################################
## Copyright(c) Oracle Corporation 1998,2016. All rights reserved.##
##                                                                ##
##                   Docker OL7 db12c dat file                    ##
##                                                                ##
####################################################################

##------------------------------------------------------------------
## Specify the basic DB parameters
##------------------------------------------------------------------

## db sid (name)
## default : ORCL
## cannot be longer than 8 characters

DB_SID=ORCL

## db passwd
## default : Oracle

DB_PASSWD=oracle

## db domain
## default : localdomain

DB_DOMAIN=localdomain

## db bundle
## default : basic
## valid : basic / high / extreme
## (high and extreme are only available for enterprise edition)

DB_BUNDLE=basic

## end

I called this file ‘docker_ora_db.txt

4. Download and Configure Oracle Database for Docker

The following command will download and configure the docker image

$ docker run -d --env-file ./docker_ora_db.txt -p 1527:1521 -p 5507:5500 -it --name dockerDB121 --shm-size="8g" store/oracle/database-enterprise:12.1.0.2

This command will create a container called ‘dockerDB121’. The 121 at the end indicate the version number of the Oracle Database. If you end up with a number of containers containing different versions of the Oracle Database then you need some way of distinguishing them.

Take note of the port mapping in the above command, as you will need this information later.

When you run this command, the docker image will be downloaded from the docker website, will be unzipped and the container setup and ready to run.

NewImage

5. Log-in and Finish the configuration

Although the docker container has been setup, there is still a database configuration to complete. The following images shows that the new containers is there.

NewImage

To complete the Database setup, you will need to log into the Docker container.

docker exec -it dockerDB121 /bin/bash

Then run the Oracle Database setup and startup script (as the root user).

/bin/bash /home/oracle/setup/dockerInit.sh

NewImage

This script can take a few minutes to run. On my laptop it took about 2 minutes.

When this is finished the terminal session will open as this script goes into a look.

To run any other commands in the container you will need to open another terminal session and connect to the Docker container. So go open one now.

6. Log into the Database in Docker

In a new terminal window, connect to the Docker container and then switch to the oracle user.

su - oracle

Check that the Oracle Database processes are running (ps -ef) and then connect as SYSDBA.

sqlplus / as sysdba

Let’s check out the Database.

SQL> select name,DB_UNIQUE_NAME from v$database;

NAME	  DB_UNIQUE_NAME
--------- ------------------------------
ORCL	  ORCL


SQL> SELECT v.name, v.open_mode, NVL(v.restricted, 'n/a') "RESTRICTED", d.status
     FROM v$pdbs v, dba_pdbs d
     WHERE v.guid = d.guid
     ORDER BY v.create_scn;


NAME			       OPEN_MODE  RES STATUS
------------------------------ ---------- --- ---------
PDB$SEED		       READ ONLY  NO  NORMAL
PDB1			       READ WRITE NO  NORMAL

And the tnsnames.ora file contains the following:

ORCL =   (DESCRIPTION = 
    (ADDRESS = (PROTOCOL = TCP)(HOST = 0.0.0.0)(PORT = 1521))
     (CONNECT_DATA = 
      (SERVER = DEDICATED)
       (SERVICE_NAME = ORCL.localdomain)     )   )

PDB1 =   (DESCRIPTION =
     (ADDRESS = (PROTOCOL = TCP)(HOST = 0.0.0.0)(PORT = 1521))
      (CONNECT_DATA =
       (SERVER = DEDICATED)
       (SERVICE_NAME = PDB1.localdomain)     )   )

You are now up an running with an Docker container running an Oracle 12.1 Databases.

7. Configure SQL Developer (on Client) to

access the Oracle Database on Docker

You can not use your client tools to connect to the Oracle Database in a Docker Container. Here is a connection setup in SQL Developer.

NewImage

Remember that port number mapping I mentioned in step 4 above. See in this SQL Developer connection that the port number is 1527.

Thats it. How easy is that. You now have a fully configured Oracle 12.1c Enterprise Edition Database to play with, to have fun and to explore all the wonderful features of the Oracle Database.

ODM Model View Details Views in Oracle 12.2

Posted on

A new feature for Oracle Data Mining in Oracle 12.2 is the new Model Details views.

In Oracle 11.2.0.3 and up to Oracle 12.1 you needed to use a range of PL/SQL functions (in DBMS_DATA_MINING package) to inspect the details of a data mining/machine learning model using SQL.

Check out these previous blog posts for some examples of how to use and extract model details in Oracle 12.1 and earlier versions of the database

Association Rules in ODM-Part 3

Extracting the rules from an ODM Decision Tree model

Cluster Details

Viewing Decision Tree Details

Instead of these functions there are now a lot of DB views available to inspect the details of a model. The following table summarises these various DB Views. Check out the DB views I’ve listed after the table, as these views might some some of the ones you might end up using most often.

I’ve now chance of remembering all of these and this table is a quick reference for me to find the DB views I need to use. The naming method used is very confusing but I’m sure in time I’ll get the hang of them.

NOTE: For the DB Views I’ve listed in the following table, you will need to append the name of the ODM model to the view prefix that is listed in the table.

table, th, td {
border: 1px solid black;
border-collapse: collapse;
text-align: left;
}

Data Mining Type Algorithm & Model Details 12.2 DB View Description
Association Association Rules DM$VR generated rules for Association Rules
Frequent Itemsets DM$VI describes the frequent itemsets
Transaction Itemsets DM$VT describes the transactional itemsets view
Transactional Rules DM$VA describes the transactional rule view and transactional itemsets
Classification (General views for Classification models) DM$VT

DM$VC

describes the target distribution for Classification models

describes the scoring cost matrix for Classification models

Decision Tree DM$VP

DM$VI

DM$VO

DM$VM

describes the DT hierarchy & the split info for each level in DT

describes the statistics associated with individual tree nodes

Higher level node description

describes the cost matrix used by the Decision Tree build

Generalized Linear Model DM$VD

DM$VA

describes model info for Linear Regres & Logistic Regres

describes row level info for Linear Regres & Logistic Regres

Naive Bayes DM$VP

DM$VV

describes the priors of the targets for Naïve Bayes

describes the conditional probabilities of Naïve Bayes model

Support Vector Machine DM$VL describes the coefficients of a linear SVM algorithm
Regression ??? Doe 80 50
Clustering (General views for Clustering models) DM$VD

DM$VA

DM$VH

DM$VR

Cluster model description

Cluster attribute statistics

Cluster historgram statistics

Cluster Rule statistics

k-Means DM$VD

DM$VA

DM$VH

DM$VR

k-Means model description

k-Means attribute statistics

k-Means historgram statistics

k-Means Rule statistics

O-Cluster DM$VD

DM$VA

DM$VH

DM$VR

O-Cluster model description

O-Cluster attribute statistics

O-Cluster historgram statistics

O-Cluster Rule statistics

Expectation Minimization DM$VO

DM$VB

DM$VI

DM$VF

DM$VM

DM$VP

describes the EM components

the pairwise Kullback–Leibler divergence

attribute ranking similar to that of Attribute Importance

parameters of multi-valued Bernoulli distributions

mean & variance parameters for attributes by Gaussian distribution

the coefficients used by random projections to map nested columns to a lower dimensional space

Feature Extraction Non-negative Matrix Factorization DM$VE

DM$VI

Encoding (H) of a NNMF model

H inverse matrix for NNMF model

Singular Value Decomposition DM$VE

DM$VV

DM$VU

Associated PCA information for both classes of models

describes the right-singular vectors of SVD model

describes the left-singular vectors of a SVD model

Explicit Semantic Analysis DM$VA

DM$VF

ESA attribute statistics

ESA model features

Feature Section Minimum Description Length DM$VA describes the Attribute Importance as well as the Attribute Importance rank

Normalizing and Error Handling views created by ODM Automatic Data Processing (ADP)

  • DM$VN : Normalization and Missing Value Handling
  • DM$VB : Binning

Global Model Views

  • DM$VG : Model global statistics
  • DM$VS : Computed model settings
  • DM$VW :Alerts issued during model creation

Each one of these new DB views needs their own blog post to explain what informations is being explained in each. I’m sure over time I will get round to most of these.