INFO 340 - March 31, 2004 Class 2 Notes By: Ryan Prins Museum as Information System > Is it valid > Benefactor (--> Stakeholder?) > Who is the stakeholder? DBMS: Like a Student Information System > Data: Structured (Formal Roles) > Purpose: Well-defined > Users: Specific Job Roles IR: Web Search Engine > Data: Unstructured (Wild West - Can't count on anything) > Purpose: ? > Users: ? DBMS would have less "garbage" than an IR system. (eg. Google trash that it has to look through) No one knows who the IR users are, but the DBMS users are defined and are known. Key data structure in a web search engine is: Inverted File DBMS: BTrees, etc... Two Key Ideas > Both systems separate into three layers > Because people are intimately involved these are communication systems Development Lifecycle > Deploy --> Define --> Design --> Develop --> Deploy (Starts over) Example: Database Development 1. Analysis of functional requirements 2. Conceptual design (What the entity, attributes, and other relationships are) 3. Logical design 4. Physical design (Performance is key issue) 5. Implement 6. Test 7. Maintain What percentage of information systems fail? 60-70% Examples: > Denver Airport - 30M/month and it was 24 months late > London Ambulance Service Computer Aided Dispatch - 2.3M (perhaps 20-30 lives lost) >Hershey Foods (Enterprise Resource Planning) - 300M Why such a high failure rate? > Poor Analysis > Human Error > Communication mistake between designers and users > Overconfidence > Financial Restrictions > Poor test planning and deadlines > Looking at technology as a silver bullet Key Lessons For Project > Project Management - Have a plan, tasks, dates, responsibilities > Objectives - Be very clear on project objectives > Pay attention to lessons learned - Anticipate resist within your situation > Project report - Write concisely, carefully and have peers review your works File Systems > Application programs manage own data files and produce reports > Collection of programs was often based on functional areas Weaknesses of File Systems > Program-data dependence > Separation and isolation of data > Incompatibility of files > Have redundant data over many systems > Ordering information based on need > Accuracy of data contained Key Lesson Learned From File Systems 1. Program-data independence is good - Programs should not be responsible for the definition of data formats 2. Centralized control of data access is good - Programs should not be responsible for security, access control, and certain kinds of data integrity. Hierarchical Data Model > Same as Network model, except 'hierarchal' Network Data Model > Collections of 'records' > Pointers used t create 'sets' Lessons Learned > Better on - Data independence - Sharing data > However, complex application programming - Chasing 'pointers to navigate data 2nd Generation: Relational Model > Data modeled as table, rows, columns > No pointer chasing > Grounded in theory (relational algebra) 3rd Generation: Object-Oriented Database Management Systems > Domain objects (entities, relationships, etc.) modeled directly rather than with tables, rows, columns > Why the need for 3G? - The unknowns - How to manage all of the attributes of a specific object Data Impedance Problem: Converting hierarchal data into table/row, data. Three-level ANSI-SPARC architecture > External Level (Views & Users) > Conceptual Level (Conceptual Schema) > Internal Level (Internal Schema) Data Independence: Making changes in one place so that it does not affect other places. External Level > Different users require different data views - Specific information for goals, job roles, etc.. > Some information is derived/calculated - Dynamic calculations (age) - Complex combinations of data Conceptual Level > What data is stored and the relationships between the data > Key concerns: - Entities, attributes, relationships - Data types - Constraints - Security and integrity info Internal Level (concerned w/ performance) > How the data is stored - Optimal run-time performance - Optimal space utilization > Key concerns: - Storage space for data and indices - Record size and placement - Data compression and encryption Schemas: Contain information for mapping from one level to the next > Example: Name (External) --> Record for name (Conceptual) > Example: Long field names (Conceptual) --> Compression of this field (Internal) Class Exercise > External Concerns - Submitting the data (e.g. Submitting a picture) > Conceptual Concerns - Filtering data (e.g pics of buildings) - Extra data withheld from the external level that is not relevant to the user > Physical Concerns - Link similar sets of data that are characterized by a building or a group of people and to have those categorized by time then to have them in files or something. But, the times would reference groups of things that occurred about that specific time. How do we define place? It is difficult to come to a good understanding of what this means. Do you require a longitude and latitude? Zip Code? Just a description? Functions of DBMS 1. Data storage, retrieval, and update 2. A user-accessible catalog 3. Transaction support 4. Concurrency control 5. Recovery services 6. Authorization services 7. Support for data communication 8. Integrity services