Innovations in databases - AIG

advertisement
INNOVATIONS IN
DATABASES
A technical and juridical perspective
Prof. Dr. Guy De Tré
Ghent University
Database, Document and Content Management
Big
data management challenges
NoSQL
database solutions
Juridical
challenges
OUTLINE
2
BIG DATA MANAGEMENT
CHALLENGES
3
Data which have such characteristics that they
cannot be efficiently handled by conventional
information systems
BIG DATA: A DEFINITION…
4
 Volume:
 Variety:
Big data
Heterogeneous data
 Velocity:
Fast data
 Veracity:
Bad data
BIG DATA: FOUR MAIN
CHARACTERISTICS
5
 Big
data: scaling up to distributed data storage
(availability vs. consistency)
 Heterogeneous
data: avoid data transformations
 Fast
data: avoid data processing overhead
 Bad
data: data quality assessment and handling
NEW CHALLENGES FOR DATA
MANAGEMENT SYTEMS
6
 Key-value
stores
 Document
 Column
stores
stores
NOSQL DATABASE SOLUTIONS
7
 Data
schema (no database schema!)
Bezoekersopinies
BID, tijdstip
Waarde
B1, 15/1:14u00
‘zaal 1’
B1, 15/1:14u01
‘zaal 1, niet leuk, te veel volk’
B1, 15/1:14u02
‘zaal 1, 2/10’
B2, 15/1:14u02 ‘zaal 1, Rembrandt is subliem’
SQL
+
B1, 15/1:14u03
‘zaal 1, ’
B2, 15/1:14u03
‘zaal 1, 9/10’
B1, 15/1:14u04 ‘zaal 2, meer mijn ding, 7/10’
B2, 15/1:14u04 ‘zaal 1, meer van dit pls’
B3, 15/1:14u04 ‘zaal 1, zo een drukte’
NoSQL
KEY-VALUE STORES
8
 Limited
interaction via API
Bezoekersopinies
BID, tijdstip
Waarde
B1, 15/1:14u00
‘zaal 1’
B1, 15/1:14u01
‘zaal 1, niet leuk, te veel volk’
B1, 15/1:14u02
‘zaal 1, 2/10’
B2, 15/1:14u02 ‘zaal 1, Rembrandt is subliem’
B1, 15/1:14u03
‘zaal 1, ’
B2, 15/1:14u03
‘zaal 1, 9/10’
B1, 15/1:14u04 ‘zaal 2, meer mijn ding, 7/10’
B2, 15/1:14u04 ‘zaal 1, meer van dit pls’
B3, 15/1:14u04
‘zaal 1, zo een drukte’
Get(B1, 15/1:14u02)
Result:
‘zaal 1, 2/10’
Put(B3, 15/1:14u05, ‘zaal 1, Waw!’)
Delete(B1, 15/1:14u02)
KEY-VALUE STORES
9
 Data
distribution (horizontal scaling – consistent hashing)
KEY-VALUE STORES
10
 Data
schema (no database schema!)
SQL
+
NoSQL
DOCUMENT STORES
11
 More
advanced interaction via API
db.opinies.find()
db.opinies.find({plaats: “zaal 1”})
db.opinies.find()sort({score: 1})
db.opinies.find({score: {$gt:8}})
db.opinies.find({score: {$gt:8}}, {plaats: “zaal 1”})
db.opinies.find({$or[{score: {$gt:8}},
{plaats: “zaal 1”}]})
DOCUMENT STORES
12
 Data
distribution (horizontal scaling – sharding)
Replica sets with
Master/Slave replication
DOCUMENT STORES
13
 Data
schema (no database schema!)
SQL
COLUMN STORES
+
NoSQL
14
 SQL-like
interaction via API
SELECT taal FROM Bezoeker WHERE naam = ‘Yana’
SELECT commentaar FROM Opinie WHERE score<5
SELECT COUNT(*) FROM Opinie WHERE dag=‘15/1/2016’
No relational database style joins supported!
The application should handle that.
COLUMN STORES
15
 Data
distribution (horizontal scaling – partitioning and replication)
Horizontal partitioning
and replication
COLUMN STORES
16
NOSQL DATABASES
17
JURIDICAL CHALLENGES
18
19
20
Sourcing
Analysing
Using
Personal data
protection
• Privacy
• Privacy compliance as competitive advantage
• Purpose limitation
Antidiscrimination
•
•
•
•
Ethical issues
Restrictions to automated decision making
Gender Act
Racism Act
Anti-discrimination Act
• Profiling
• Right to correction and removal
21
Sourcing
Cloud
Competition
Data
ownership
Analysing
Using
• Sharing personal data with third party, within group
• Store personal data on centralized system
• Principle prohibition with exceptions
• Pricing based on behaviour of the consumer
• Charging customers a different price for
the same product
• IP protection for database owners
• Contractual protection
• Confidentiality
22
New rights for
individuals
New obligations
for companies
Stronger
enforcement of
infringements
23
THANK YOU
For your attention
Guy De Tré
[email protected]
Database, Document and Content
Management
ddcm.ugent.be
24
UGAIN
UGent Academie voor Ingenieurs
Opleiding Big data
http://www.ugain.ugent.be/bigdata2017.htm
25
Download