INNOVATIONS IN
DATABASES
A technical and juridical perspective
Prof. Dr. Guy De Tré
Ghent University
Database, Document and Content Management
Big
data management challenges
NoSQL
database solutions
Juridical
challenges
OUTLINE
2
BIG DATA MANAGEMENT
CHALLENGES
3
Data which have such characteristics that they
cannot be efficiently handled by conventional
information systems
BIG DATA: A DEFINITION…
4
Volume:
Variety:
Big data
Heterogeneous data
Velocity:
Fast data
Veracity:
Bad data
BIG DATA: FOUR MAIN
CHARACTERISTICS
5
Big
data: scaling up to distributed data storage
(availability vs. consistency)
Heterogeneous
data: avoid data transformations
Fast
data: avoid data processing overhead
Bad
data: data quality assessment and handling
NEW CHALLENGES FOR DATA
MANAGEMENT SYTEMS
6
Key-value
stores
Document
Column
stores
stores
NOSQL DATABASE SOLUTIONS
7
Data
schema (no database schema!)
Bezoekersopinies
BID, tijdstip
Waarde
B1, 15/1:14u00
‘zaal 1’
B1, 15/1:14u01
‘zaal 1, niet leuk, te veel volk’
B1, 15/1:14u02
‘zaal 1, 2/10’
B2, 15/1:14u02 ‘zaal 1, Rembrandt is subliem’
SQL
+
B1, 15/1:14u03
‘zaal 1, ’
B2, 15/1:14u03
‘zaal 1, 9/10’
B1, 15/1:14u04 ‘zaal 2, meer mijn ding, 7/10’
B2, 15/1:14u04 ‘zaal 1, meer van dit pls’
B3, 15/1:14u04 ‘zaal 1, zo een drukte’
NoSQL
KEY-VALUE STORES
8
Limited
interaction via API
Bezoekersopinies
BID, tijdstip
Waarde
B1, 15/1:14u00
‘zaal 1’
B1, 15/1:14u01
‘zaal 1, niet leuk, te veel volk’
B1, 15/1:14u02
‘zaal 1, 2/10’
B2, 15/1:14u02 ‘zaal 1, Rembrandt is subliem’
B1, 15/1:14u03
‘zaal 1, ’
B2, 15/1:14u03
‘zaal 1, 9/10’
B1, 15/1:14u04 ‘zaal 2, meer mijn ding, 7/10’
B2, 15/1:14u04 ‘zaal 1, meer van dit pls’
B3, 15/1:14u04
‘zaal 1, zo een drukte’
Get(B1, 15/1:14u02)
Result:
‘zaal 1, 2/10’
Put(B3, 15/1:14u05, ‘zaal 1, Waw!’)
Delete(B1, 15/1:14u02)
KEY-VALUE STORES
9
Data
distribution (horizontal scaling – consistent hashing)
KEY-VALUE STORES
10
Data
schema (no database schema!)
SQL
+
NoSQL
DOCUMENT STORES
11
More
advanced interaction via API
db.opinies.find()
db.opinies.find({plaats: “zaal 1”})
db.opinies.find()sort({score: 1})
db.opinies.find({score: {$gt:8}})
db.opinies.find({score: {$gt:8}}, {plaats: “zaal 1”})
db.opinies.find({$or[{score: {$gt:8}},
{plaats: “zaal 1”}]})
DOCUMENT STORES
12
Data
distribution (horizontal scaling – sharding)
Replica sets with
Master/Slave replication
DOCUMENT STORES
13
Data
schema (no database schema!)
SQL
COLUMN STORES
+
NoSQL
14
SQL-like
interaction via API
SELECT taal FROM Bezoeker WHERE naam = ‘Yana’
SELECT commentaar FROM Opinie WHERE score<5
SELECT COUNT(*) FROM Opinie WHERE dag=‘15/1/2016’
No relational database style joins supported!
The application should handle that.
COLUMN STORES
15
Data
distribution (horizontal scaling – partitioning and replication)
Horizontal partitioning
and replication
COLUMN STORES
16
NOSQL DATABASES
17
JURIDICAL CHALLENGES
18
19
20
Sourcing
Analysing
Using
Personal data
protection
• Privacy
• Privacy compliance as competitive advantage
• Purpose limitation
Antidiscrimination
•
•
•
•
Ethical issues
Restrictions to automated decision making
Gender Act
Racism Act
Anti-discrimination Act
• Profiling
• Right to correction and removal
21
Sourcing
Cloud
Competition
Data
ownership
Analysing
Using
• Sharing personal data with third party, within group
• Store personal data on centralized system
• Principle prohibition with exceptions
• Pricing based on behaviour of the consumer
• Charging customers a different price for
the same product
• IP protection for database owners
• Contractual protection
• Confidentiality
22
New rights for
individuals
New obligations
for companies
Stronger
enforcement of
infringements
23
THANK YOU
For your attention
Guy De Tré
[email protected]
Database, Document and Content
Management
ddcm.ugent.be
24
UGAIN
UGent Academie voor Ingenieurs
Opleiding Big data
http://www.ugain.ugent.be/bigdata2017.htm
25