INNOVATIONS IN DATABASES A technical and juridical perspective Prof. Dr. Guy De Tré Ghent University Database, Document and Content Management Big data management challenges NoSQL database solutions Juridical challenges OUTLINE 2 BIG DATA MANAGEMENT CHALLENGES 3 Data which have such characteristics that they cannot be efficiently handled by conventional information systems BIG DATA: A DEFINITION… 4 Volume: Variety: Big data Heterogeneous data Velocity: Fast data Veracity: Bad data BIG DATA: FOUR MAIN CHARACTERISTICS 5 Big data: scaling up to distributed data storage (availability vs. consistency) Heterogeneous data: avoid data transformations Fast data: avoid data processing overhead Bad data: data quality assessment and handling NEW CHALLENGES FOR DATA MANAGEMENT SYTEMS 6 Key-value stores Document Column stores stores NOSQL DATABASE SOLUTIONS 7 Data schema (no database schema!) Bezoekersopinies BID, tijdstip Waarde B1, 15/1:14u00 ‘zaal 1’ B1, 15/1:14u01 ‘zaal 1, niet leuk, te veel volk’ B1, 15/1:14u02 ‘zaal 1, 2/10’ B2, 15/1:14u02 ‘zaal 1, Rembrandt is subliem’ SQL + B1, 15/1:14u03 ‘zaal 1, ’ B2, 15/1:14u03 ‘zaal 1, 9/10’ B1, 15/1:14u04 ‘zaal 2, meer mijn ding, 7/10’ B2, 15/1:14u04 ‘zaal 1, meer van dit pls’ B3, 15/1:14u04 ‘zaal 1, zo een drukte’ NoSQL KEY-VALUE STORES 8 Limited interaction via API Bezoekersopinies BID, tijdstip Waarde B1, 15/1:14u00 ‘zaal 1’ B1, 15/1:14u01 ‘zaal 1, niet leuk, te veel volk’ B1, 15/1:14u02 ‘zaal 1, 2/10’ B2, 15/1:14u02 ‘zaal 1, Rembrandt is subliem’ B1, 15/1:14u03 ‘zaal 1, ’ B2, 15/1:14u03 ‘zaal 1, 9/10’ B1, 15/1:14u04 ‘zaal 2, meer mijn ding, 7/10’ B2, 15/1:14u04 ‘zaal 1, meer van dit pls’ B3, 15/1:14u04 ‘zaal 1, zo een drukte’ Get(B1, 15/1:14u02) Result: ‘zaal 1, 2/10’ Put(B3, 15/1:14u05, ‘zaal 1, Waw!’) Delete(B1, 15/1:14u02) KEY-VALUE STORES 9 Data distribution (horizontal scaling – consistent hashing) KEY-VALUE STORES 10 Data schema (no database schema!) SQL + NoSQL DOCUMENT STORES 11 More advanced interaction via API db.opinies.find() db.opinies.find({plaats: “zaal 1”}) db.opinies.find()sort({score: 1}) db.opinies.find({score: {$gt:8}}) db.opinies.find({score: {$gt:8}}, {plaats: “zaal 1”}) db.opinies.find({$or[{score: {$gt:8}}, {plaats: “zaal 1”}]}) DOCUMENT STORES 12 Data distribution (horizontal scaling – sharding) Replica sets with Master/Slave replication DOCUMENT STORES 13 Data schema (no database schema!) SQL COLUMN STORES + NoSQL 14 SQL-like interaction via API SELECT taal FROM Bezoeker WHERE naam = ‘Yana’ SELECT commentaar FROM Opinie WHERE score<5 SELECT COUNT(*) FROM Opinie WHERE dag=‘15/1/2016’ No relational database style joins supported! The application should handle that. COLUMN STORES 15 Data distribution (horizontal scaling – partitioning and replication) Horizontal partitioning and replication COLUMN STORES 16 NOSQL DATABASES 17 JURIDICAL CHALLENGES 18 19 20 Sourcing Analysing Using Personal data protection • Privacy • Privacy compliance as competitive advantage • Purpose limitation Antidiscrimination • • • • Ethical issues Restrictions to automated decision making Gender Act Racism Act Anti-discrimination Act • Profiling • Right to correction and removal 21 Sourcing Cloud Competition Data ownership Analysing Using • Sharing personal data with third party, within group • Store personal data on centralized system • Principle prohibition with exceptions • Pricing based on behaviour of the consumer • Charging customers a different price for the same product • IP protection for database owners • Contractual protection • Confidentiality 22 New rights for individuals New obligations for companies Stronger enforcement of infringements 23 THANK YOU For your attention Guy De Tré [email protected] Database, Document and Content Management ddcm.ugent.be 24 UGAIN UGent Academie voor Ingenieurs Opleiding Big data http://www.ugain.ugent.be/bigdata2017.htm 25