Uploaded by r.breeveld

Data Roadmap final

advertisement
Data Roadmap Sco, Thorne Mary Weisse 1/26/11 1 Agenda •
•
•
•
Current state and background Issues and problems Approach Next steps – issues to be addressed – issues deferred 1/26/11 2 MIT Data Warehouse -­‐ Overview • Most administraJve informaJon is available • Widely used and relied on • Recognized by peer insJtuJons as one of the best in higher educaJon • Improvements can make it more useful Examples: – adding more data – cleaning up data – real Jme data – true 24x7 availability 1/26/11 3 Why the Warehouse was Created • Primary reporJng plaRorm – provide a common access mechanism for all InsJtute wide administraJve data – easier access provided by a simpler view of data • Support departmental reporJng needs – ability to combine central and local data – reduce rekeying of data • Improve data quality through usage • Self service – minimize support & operaJonal costs – DLC's don’t need to rely on a central reporJng group 1/26/11 4 What Data is Available •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
1/26/11 Academic Chairs Alumni Biographic, OccupaJon & Degree AssociaJon of Student AcJviJes Group, Requests and Space Benefits Budgets (Nimbus) CIP (ClassificaJon of InstrucJonal Programs) Concur Travel Coop Textbook Course Catalog Cybersource MIT Storefront transacJons Department Master Hierarchy (and other hierarchies) EHS Contact, Room set, Sarah, Training Data FaciliJes/Plant Maintenance Orders Fidelity – Benefits Financial Financial Aid •
•
•
•
•
•
•
•
•
•
•
•
•
•
Fund data (Resource Development) Graduate Admissions Graduate Awards HR (Employees, PosiJons, & Appointments) Cyborg & SAP IAP Credit & Non-­‐Credit AcJviJes Instructor InsJtuJonal Research -­‐ Honors & Awards Kerberos Keys Libraries (Financials, Orders, CirculaJons, CollecJon Management, Serials Control Processing) Metadata (Table, view, & field definiJons, data models; report templates) MIT Card – Pictures MIT ID Moira List data 5 What Data is Available (cont) •
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
1/26/11 OSP Proposals and Award Cost Sharing Overhead Rates Payroll (DeducJons, DistribuJons & CerJficaJons) Pension Person Pre-­‐registraJon & RegistraJon Procurement (Purchasing, RequisiJon & Credit Card) Request Tracker Roles /AuthorizaJon Space Student Biographic Student Degree Subject Enrollment Student Grades Training & Events TLO •
•
•
•
•
Telephone Detail (ice 9) Student Term Enrollment Undergrad Admissions VacaJon/Absences/Leaves Web StaJsJcal Data 6 Context Diagram Source Systems User Access Warehouse Environment Extract Transform & Load Files Web ReporJng Tool DBMS SQL Brio User Pull Files Metadata Extract 1/26/11 7 Issues and Problem CollecJon • Who –
–
–
–
–
Data Warehouse Community Survey Task Force Reports ReporJng Tool Review Commi,ee DW Staff Interview data users • InsJtuJonal, School, Department • Central, Research, Academic • What – Document • specific issues • brief analysis – IdenJfy next steps 1/26/11 8 Categories of Data Issues •
•
•
•
•
1/26/11 Warehouse infrastructure & operaJons ReporJng environment Missing data Data access policy Strategic reporJng (hard to answer quesJons) 9 Warehouse Infrastructure & OperaJons Issues idenJfied by the Warehouse Staff • Job scheduling – start Jme manually set – extra Jme used to pad – hard to maintain • 7x24 availability • Congested nightly loads – scheduling – full/incremental – a few intensive jobs 1/26/11 10 Warehouse Infrastructure & OperaJons • Performance of large detailed tables as history grows • Managing the daily operaJon – reduce the number of non standard processes • Incomplete DocumentaJon • Lack of Real Jme data feeds • Lack a process to build and deliver higher level informaJon 1/26/11 11 ReporJng Environment Issues • Data definiJons need improvement • Need intuiJve table and field names • Report descripJons aren't sufficient – hard to find the one you need – report names should be in the footer, so once you have paper you can find the report used to produce it – organize reports be,er – easy to idenJfy reports that need to be run by DLC monthly • Easier way to get the data from the Warehouse and format in Excel • Easy way to join data from mulJple domains 1/26/11 12 Data Missing from Warehouse • OSP – Pre-­‐Proposal – Award Budgets • Property • TLO – Basic • Alumni • ISO informaJon • Travel charges that haven't been submi,ed on a trip report • Travel Guest Card charges not available for DLC's • Historical Request Tracker InformaJon • Profit Center Group History • Building Data – Date Built and Date Renovated • Space Survey Data • Building DepreciaJon 1/26/11 13 Data Access Policy Issues • Room usage type • Appointment supplements • HR informaJon for people affiliated with labs & centers – paid but not appointed – faculty not charged or appointed 1/26/11 14 Strategic ReporJng (Hard to answer quesJons) • What does it cost MIT to deliver its programs? – Undergraduate EducaJon, Graduate EducaJon and Research – How do costs vary by school and program? – How do educaJon costs relate to tuiJon charged (sJcker price) and tuiJon collected (net tuiJon revenue)? – How do research costs relate to research revenues collected (direct and indirect)? • What does it cost MIT to own and maintain its space? – capital (interest, depreciaJon) – operaJonal (uJliJes, repair and maintenance costs) – How do costs vary by building and relate to market rates? 1/26/11 15 Strategic ReporJng (Hard to answer quesJons) • How much money has MIT received from all sources (gios, research revenues, other) in support of major areas such as Cancer Research or Energy compared to total resources used and needed? • What are MIT’s gross expenses by funcJon before internal charges? 1/26/11 16 State of Data PracJces • Current sooware project development does not always include – reporJng requirements – conceptual data model – data definiJons • Projects ooen focus on a transacJonal orientaJon – not what ways the informaJon might be used in the future 1/26/11 17 ReporJng Commi,ee Findings • IBM/COGNOS Tool Suite meets user requirements for funcJonality and look & feel. – Handling of local data – Web-­‐based soluJon • Data issues will not be resolved by a tool • Rollout of the tool will be a mulJyear effort • Data issues should be addressed as the tool is rolled out • Requires the involvement of the Business Owners and Community 1/26/11 18 Approach • Demonstrate to the community that we are working on all aspects of the problem • Create a repeatable process that can applied per subject • Gain experience with the new tool • Begin engaging Business Owners • Partner with Development Projects • Make demonstrable incremental progress 1/26/11 19 Areas of Focus • Improving ReporJng Infrastructure • Improving Development PracJces • Improving Data PracJces 1/26/11 20 Categories of Work • Day to day operaJons – upgrading the environment – monitoring the loads – supporJng minor changes • Housekeeping – Metadata maintenance – Warehouse documentaJon •
•
•
•
•
1/26/11 New reporJng tool administraJon Building reports Strategic quesJons Modeling New informaJon in Warehouse 21 Resources • Team – 2 Data Administrators – 2 Developers – 2 ReporJng Consultants • Project Teams • Business Owners 1/26/11 22 Next Steps 2011 • Improving ReporJng Infrastructure – migraJon to VM – develop plan for 24x7 – reporJng tool proof of concept • Improving Development PracJces – iniJal data modeling support • Improving Data PracJces – document problems & issues – Improve metadata documentaJon 1/26/11 23 Issues to be Addressed • Tested in reporJng tool proof of concept – be,er Excel integraJon – easier report delivery • automaJc noJficaJon and delivery – be,er report descripJons – be,er report organizaJon – simpler views of informaJon – simpler joining of informaJon • Short term soluJon for Singapore access 1/26/11 24 Work Deferred (Post FY2011) • Improving ReporJng Infrastructure – real Jme data – 24x7 implementaJon – Job scheduling • Improving Development PracJces – to be determined • Improving Data PracJces –
–
–
–
strategic (Hard to answer quesJons) define and publish guidelines define and publish roles & responsibiliJes document data access policies • New data in Warehouse – should be done with new tool 1/26/11 25 Improving Understanding of Data • Simpler more focused views of informaJon • Be,er definiJons and documentaJon • ReporJng tool proof of concept – including process for developing specialized views 1/26/11 26 Improving IS&T PracJces • Focus with the business on up front design – conceptual data model – business process models – reporJng requirements • Metadata documentaJon • Document reconciliaJon processes 1/26/11 27 Improving Data PracJces • Publish principles & guidelines for data management • Establish processes for data issue resoluJon and prioriJzaJon • Gain consensus and document roles & responsibiliJes concerning data • Develop a method to monitor progress 1/26/11 28 TIMELINE July 2011 Improving Warehouse Environment Migrate to Linux/VM Job Scheduling 7x24 ParJJoning Improving the Understanding of Data ReporJng Tool Pilot Rollout New Tool per Area Improving Meta Data Data DocumentaJon per Area Space Example Define Focused Views per Area Improving IS&T PracJces Establishing Modeling ConvenJons Improving Data PracJces Defining Roles & ResponsibiliJes 1/26/11 29 Appendix • Warehouse background – design principles – Warehouse processing – staJsJcs • Example of Addressing Issues in a single domain – Issues – short term plan – long term plan • Issues found so far grouped by subject 1/26/11 30 Data Warehouse Guiding Principles • InformaJon in a single place – access any informaJon in the same manner – integrated informaJon • join informaJon from several sources • Structures should support the needs of all types of users – Central, DLC, InsJtuJonal • Open access via SQL – not proprietary only access – uncommon in educaJon and industry – can be accessed programmaJcally • Warehouse is the data distribuJon hub – easy to verify when feeds and reports are from a single source 1/26/11 31 Data Warehouse Guiding Principles (CONT) • Data should be stable – consistency between reports generated on the same day – ability to generate the same report at any point in Jme • Data has to be accurate – stale data is be,er than inaccurate data • Structures should make it easy to report • Data is not corrected or modified in the Warehouse • Access rules are shared by the Warehouse and transacJonal systems – access managed by Business Owners 1/26/11 32 Data Warehouse Guiding Principles (CONT) •
•
•
•
Data always comes from a system of record Users should be able generate and share reports Metadata is available for all informaJon Access control done at the database level – enables direct access via any means • Metadata driven transformaJons and loads – to know how data was arrived at – same code called thousands of Jmes every night 1/26/11 33 Data Warehouse Process Stages Metadata Driven • Extract – minimal logic – full or Incremental files generated • Transfer – usually pushed automaJcally – encrypted/decrypted – simple integrity checking done • Convert – data into reporJng format and structures • star schemas • Load 1/26/11 34 Metadata 1/26/11 35 Integrity Checking in all Processing •
•
•
•
•
•
•
1/26/11 Correct files on hand before job runs Record & byte counts Comparisons of control file to data file Conversion type checking, number and types of fields DBMS constraints Error checking and validaJon rouJnes Mail sent to Warehouse & Business Owners 36 System Availability • Approaches 24 x 7 – except Saturdays from 3:00 pm – 3:10 pm • snapshot – some tables are temporarily unavailable in the early morning as new day’s data is loaded 1/26/11 37 StaJsJcs Jobs per night
Input files per night
Field transformaJons
DisJnct funcJons ~300 875 12,847 78 Tables
Views
Records
Total database size
1,522 1,150 1,904,948,496 575g 1/26/11 38 StaJsJcs Fiscal Year 2011 Quarter 2 (October1 2010 – December 31,2010) • 664 disJnct users logged into the Data Warehouse • RepresenJng 117 org units • GeneraJng 782,177 sessions 1/26/11 39 StaJsJcs DW User Community: Sessions by Department (400+ Sessions) FY 2011 Q2 730 840 808 753 547 615 492 483 Environment, Health & Safety Office HQ, InformaJon Services & Technology 451 441 Others (research affiliates and scienJsts, postdoctoral fellows, visiJng faculty/students/scholars, non-­‐current employees, etc.) Center for InternaJonal Studies 965 902 1151 Human Resources 1299 20428 1496 1752 Research Laboratory of Electronics Office of Sponsored Programs 2092 Sloan School of Management 2201 InformaJon Services & Technology 3167 Microsystems Technology Laboratories GL Ops & ReporJng & Journal Vouchers 3248 5413 10490 Budget, Finance and Treasury Sponsored & Genl A/R & Cashier 1/26/11 40 Data Issues Example (Space) • Buildings need to be mapped to their cost collectors manually • Building costs are collected via secondary posJngs – many of these are done manually requiring significant effort • UJlity usage metadata not available in the Warehouse • Flaws in the accuracy due to the fact that some uJlity meters cover more than one building • The older organizaJon unit idenJfiers (6 digit) make it hard to relate space informaJon to other domains • All systems do not use a standard format when referencing locaJon informaJon making integraJon difficult • Building built and last renovated dates are not shown in the Warehouse • Access to space informaJon in not consistent or documented • Building depreciaJon informaJon not in the Warehouse 1/26/11 41 Space Data Short Term Plan • Use new HR org unit numbers in the feeds and views of the Warehouse • Add the cost collector that relates to the building in the building feed – include the cost collector key in the Warehouse view • Add new fields to the building feeds and views for date built and date renovated • Add geocode fields to the room feed and views • Update the Warehouse metadata – add informaJon about uJlity usage 1/26/11 42 Space Long Term Plan • Work with FaciliJes and others to agree on guidelines for integraJng locaJon informaJon – building and room idenJfiers – geocoding • guidelines for tracking geocode for things like equipment • Work to gain consensus and document access policies concerning space informaJon • Ensure that space survey data it in the Warehouse • Encourage reporJng of space informaJon from the Warehouse where appropriate – possible pilot web reporJng • Bring building depreciaJon informaJon into the Warehouse – create an integrate view of all space costs in the Warehouse 1/26/11 43 HR Data Issues •
Faculty promoJon and Tenure dates have to be cleaned up by hand –
•
•
CIP (Academic Discipline) a,ached to All Academic Staff Race & Ethnicity not consistent over Jme –
•
•
•
•
•
•
•
1/26/11 How many researchers have PHD? Many things could be calculated in a standard way to make it easier and reduce discrepancies. –
•
self idenJfied ethnicity needs to be managed and synchronized with InsJtuJonally idenJfied Historical data is inaccurate because old records are changes without the modified date and effecJve dates properly maintained Many places where more than one affiliaJon occurs but always need one marked "Primary" to support non-­‐duplicated counts HR data not accurate. –
•
SAP & business policy does not support correcJon of some informaJon For example: years at MIT, could be provided in addiJon to the appointment dates. There seem to be different tallies for the Annual Head Count Report (October 31st) which cause confusion. Inaccuracies in Faculty appointment dates Inaccuracies in Faculty Leave informaJon Ethnicity & Race not maintained accurately How many FTE by FY for a DLC for the last 10 years 44 Finance Data Issues •
Hard to maintain CEMIT Groups –
–
•
Difficult to separate out internal charging and transfers –
–
–
•
•
1/26/11 no common definiJon of what "deficit" accounts are GL_Account_Report is confusing. Travel charges that haven't been submi,ed on a trip report can't be reported on Guest Card charges not available for DLCs Sponsor Billing done at Parent WBS level, so children look in deficit No common definiJon of what "deficit" accounts are –
•
•
some specific GL_Accounts for internal that can be excluded for this view could use SI transacJons instead of SA, but no way to ensure people do that correctly no way to enforce integrity rule in SAP Many Document Types not used, but not filtered out of list Sponsor Billing done at Parent WBS level, so children look in deficit –
•
•
•
•
can't tell if they're used • so hesitant to delete them Need to verify that they are all correct each month • maybe an exclusion list would help (to see if the right gl_accounts are excluded and only those) • mistakes lead to reporJng errors billing lag, means DLCs and Central might have different views and definiJons Hard to see if changes in a Person's status from ON/OFF Campus have been made before the posJngs Need Profit Center Group History so that we can recreate old reports and understand the differences in numbers 45 Student System Issues • What is the Cost of EducaJon? – By Department and Program • per Student – How much TuiJon is Charged? • What is waived? – How much of TuiJon is paid? • by the Student • by a Sponsor • Don't have accurate info on Grads. –
–
–
–
1/26/11 Where did they get their undergrad degree? When did a student enter in which degree program? They have Grad entry term, but not cohort. Are they fully supported (external sources hard)? 46 
Download