ITGS Chapter 3 Terms Flashcards | Quizlet

DIKW Pyramid

Data, information, knowledge, and wisdom (DIKW) pyramid

  • Hierarchy that represents the structural and functional relationships between data, information, knowledge, and wisdom
  • Each step answers different questions about the initial data and adds value to it

Data

A collection of facts in raw, neutral, unorganized form
Ex.:

  • numbers, characters,
  • 3.14
  • A, O, d, w, 1
    Without context, data means nothing

Information

  • Data becomes information when it is process into a format that is logical, error-free, easier to measure, visualize, and analyse for a specific purpose
  • Aggregation: combining different sets to compare contrast
  • Validation: to prove that the collected data is relevant & accurate
  • Ex. stock market index (tesla’s earning report lower than expected, stock market crashes) (great depression 2, the electric boogaloo)
  • Information can be derived from data when we ask who, what, where, when
  • *But it does not tell us *HOW

Knowledge

  • A logical description of collected facts (data)
  • When we don’t just view information as a description of collected facts, but also understand how to apply it to achieve our goals. We turn it into knowledge
  • When one entity (student, company, team) gains more knowledge than the other party, they gain the competitive edge (ex. The use of machine learning to enhance photo quality for google pixel phones)(explains why my photos look like shit)

Wisdom

  • Knowledge applied in action
  • Knowledge only becomes wisdom when you take action
  • If after you learned the knowledge, but you just sit there and do nothing, then that knowledge doesn’t turn into wisdom

Example of DIKW Pyramid

Types of Data

Qualitative

  • Non-statistical
  • Typically unstructured or semi structure
    • Data not measured using numbers to derive a conclusion
    • Based on data properties, attributes, labels, etc.
    • Used as a start for asking ‘why” questions (ex. Why is the paper due tomorrow?)
    • Deals with non-numerical data (concepts, descriptions, meanings, words etc.)
  • Generating this data from qualitative research is used for theorizations, interpretations, developing hypothesis, and initial understandings
  • Qualitative data identifies can be subjective, making qualitative data analysis a complex processes

Quantitative

  • Statistical
    • Typically structured and defined in nature
    • Measured using numbers and values, making it suitable for data analysis
    • Prompts the question of “how much” & “how many” with conclusive information (ex. How much time left until i have to write the paper / how many skittles will it take to bribe mr poon to delay the paper)
    • There are two types of quantitative data: Continuous & discrete data

Continuous

  • Data that can be infinitely broken down into smaller parts or data that continuously fluctuates
  • Ex. your heart rate (very fast), the speed of a car while driving, your weight, your age

Discrete

  • Data that cannot be broken down into smaller parts
  • Consists of integers
  • Is finite and has a limit
  • Ex. how much money is in my wallet (none), how many hours left until my death at the hands of this paper?)

Recap

Example

What is qualitative data? A bookcase...

  • Is made of wood
  • Has golden knobs
  • Was built in Italy

What is quantitative data? A bookcase...

  • Is 3 feet long
  • Weighs 100 pounds
  • Has 3 shelves and 2 cabinets
  • Costs $1500

Pop Quiz!: Qualitative v. Quantitative Data Exercises

Cultural Data

  • Data related to art, humanities, and cultural studies
  • Digitization of items in museums (ex. 3d virtual tours, etc)
  • Information on artifacts and history
    • Significant of history - creates community?
  • Culture is a big majority of humanity’s life

Other Kinds of Data

  • Financial
    • Information related to the financial health of a business
    • Used by internal management to analyze business performance and outside people to judge its credit worthiness and whether to invest or not
    • Ex. assets, liabilities, owners equities, stock price
  • Geographical (Geodata)
    • Data and information that has explicit or implicit association with location relative to earth
  • Medical (clinical)
    • Health related information that is associated with regular patient care
    • Helps healthcare providers monitor the wellbeing of their patients by montering any emerging trends
    • Ex. blood sugar level, heart rate, virus count within the body, etc.
  • Meteorological data
    • Meteorology: A branch of the atmospheric sciences (which includes atmospheric chemistry and physics) with a major focus on weather forecasting
    • Data consisting of physical parameters that are measured directly by instruments
    • Ex. temperature, dew point, wind direction, wind speed, etc.
  • Transport data
    • Designs and implements studies
    • Develops databases
    • Monitors trends and analysis data relating to all modes of travel
    • Ex. walking, cycling, public transport
  • Scientific data
    • Data based on research carried out by researchers
  • Statistical data
    • Data based on collected statistics

Metadata

Metadata is data that provides information about other data, but not the content of the data

Descriptive metadata

  • Used to identify specific data (ex. authors, titles, keywords)
  • Ex. a file named pain and suffering and contains all the TOK essays

Structural metadata

  • Organization and classification of data for ease of access (ex. classifying the chapters of a book)
  • Ex. the essays being catalogued into different dates

Administrative metadata

  • Management of data including rights of who it belongs to (ex. the use and preservation of business data)
  • Ex. the IB diploma student who created the folder and all the files within

Reference metadata

  • Data that describes the quality of statistics data

Statistical metadata

  • Descriptive information about statistical data

Legal metadata

  • Underlying data relating to the legal case files for ease of access

Data Life Cycle

  1. Generation
    • First stage of data life cycle
    • Every action done digitally generates data
      • Every sale, purchase, hire, communication, interaction
    • Can also be manually entered by hand or completion of online form
    • The generated data can provide powerful insight
  2. Collection/extraction
    • Not all data that’s generated every day is collected or used
      • It is a decision based on the stakeholders perspectives
      • Ex. roomba wants a map of everyone’s homes, while amazon wants the buying pattern of their target market
  3. Processes / usage (beginning)
    • Once the data has been collected, it must be processed from its raw format (think DIKW pyramid)
    • Data wrangling: Raw data that has been cleaned and transformed into useful and accessible data
    • Data compression: Data transformed into a format that can be efficiently stored (ex. pdf’s)
    • Data encryption: data transformed into another code to prevent east access
  4. Storage (beginning)
    • Once processed and clean, the data needs to be stored (creating a database) into a safe palace with a preset level of security and access so that not everyone can see the data (information is precious after all). They can be stored in the cloud, servers, or local file (physical storage on your computer / hard drives)
  5. Management (the middle)
    • Aka database management
    • Involves organizing, storing, and retrieving data as necessary over the life of a data project
    • It is an ongoing process from beginning to the end of the data cycle (everything from storage and encryption to implementing access logs)
  6. Analysis (the middle)
    • Turns raw and/or process data into meanings (patterns, trends, forecast)
    • Some tools and methods include statistical modeling, algorithms, data mining, and two other IT forms (related to the podcast) 
    • Ex. there’s a rise of sleep deprivation during the beginning of November, which coincidentally is also when a new song came out and when a new 5-star character in a online game is released (gee i wonder what game)
  7. Visualization (optional depending on situations) the middle)
    • Turning the analyzed data into visual representations of your information
    • Visualizing data makes it easier to quickl;y communicate your analysis to a wider audience both inside and outside your organization
    • Ex. bar graphs
  8. Interpretation (the middle)
    • This process is to make sense of your data analysis
    • Beyond simply presenting data, this is when you investigate and try to connect two analysis together
    • Ex. in reference to the previous sleep deprivation example, gamers sleep less (data 1) as they sacrifice sleep grinding for the new 5 star character that was released in november (data 2)
  • Preservation (not on the chart) (the end)
    • After step 8, you want to publicize this as a fact to the world. While you do that, you want to ensure that the data is preserved and that the quality remains pristine. To do this you need to find a spot to keep your data (on the cloud or on a physical hard drive)
  • Destruction (also not on the chart) (the end)
    • No data can be kept forever, as the data will eventually become obscure and outdated
    • Depending on the data policy form the company, you may be asked to destroy the data so that it can never be retrieved (ex. When an employee leaves a company, their emails, logins, etc are all scrapped)

Data Integrity

  • A concept and process that ensures the accuracy, completeness, consistency, and validity of an organization’s data
  • It is important to protect data integrity as it maintains the data authenticity, being transparent
  • It also eliminates bias, confusion, and provides valid data that is genuine and factual

Commission

  • Creating data from nothing
  • Ex. doing a lab but you screwed up and instead made up data on the spot for completion marks

Omission

  • Removing, reducing, or erasing data (choosing specifically what data looks better on that lab you screwed up on)

Manipulation

  • Changing the data, modifying the data to look more appealing (changing values on the lab so that it looks even more authentic)

Primary vs Secondary Data

Primary data

  • Actual picture of a document
  • Data from an original source
  • First collection of the data (ex. You do a lab and collect data which you use yourself. If a classmates copies from you, then that copied data is now secondary data)

Secondary data

  • Data that has been collected by someone else
  • Not primary source of data
  • Ex. the friend you copied off who turns out copied from someone else

Database

Database and their structure

Relational Database

  • A classic and frequently used database
  • Has more than one table, as it is meant to display the relationship between different databases
    • Each row is called a entity as that row represents the data for that entirety
    • Different datas are sorted into different tables. A primary key is used to ensure that the data in a specific column is unique
    • A foreign key is a column or group of columns in a relation database that provides a link between data in two tables
    • After the primary key is not the main attribute anymore, it becomes a foreign key
    • The model also accounts for the types of relationships between those tables, including:
      • One to one 
      • One to many
      • Many to many

Reduce Data Error

  • Use of Validation
    • Means that only suitable/valid data can be entered into the database
  • Use of Verification
    • Used when you want to check the data to ensure the data entered is actual correct data that you want to be in the database
    • This can be easily done using methods such as peer review.

Data Security

Why is data security important?

Protects a company from financial loss, reputation harm, loss of customer trust

Side note: whenever reading articles, always identify at least 2 key stakeholders

Solutions:

  • Data encryption
  • Enhanced data security
  • Data masking - turn data into fictional data

Methods of Protecting Data

Data encryption

  • Process converting data in unreadable codes to prevent unknown source from accessing the data

Data masking

  • Process of turning data into some other fictional data to “mask” the real data from being accessed even if hackers got access to the data (ex. Turning company data into data about a fairytale)

Data erasure

  • Destroying the data

Data Erasure

Physical

  • degaussers/powerful electromagnetic fields to remove data from hard drives
  • Paper shredders
  • Burning it in a fireplace

Digitial

  • Data erasure software, completely removes the data from storage by replacing the data with ones and zeros

Differences

Deleting Data

  • You put your data in the trash bin but its still there
  • Even if you clean out your recycle bin the FBI can still retrieve the deleted data (oh shi really? Fk where did my pipe bomb plans go)

Erasing Data

  • Replacing data with binary
  • ex. plastic surgery

Data Encryption

  • The process of encoding a message or information in such a way that only authorized parties can access it
  • VPN: encryption into the web
  • Ciphertext: a form of a message that is not understandable

Secret Key Encryption

Mailbox analogy: a mailbox only accessible to you, as only you have access and are able to put things and and take things out

Public Key Encryption

Mailbox analogy: a mailbox that allows the public to put things in, but only you can retrieve the contents

Terms

Blockchain

  • A distributed database or ledger that is shared among the nodes of a computer network
    • Ledger: a book or collection of accounts in which account transactions are recorded
  • As a database, blockchains store information electronically vial digital format
  • Best known for their crucial role in cryptocurrency systems, such as bitcoin, for maintaining a secure and decentralized record of transactions
  • Guarantees the loyalty and security of a record of data and generates trust without the need for a trusted 3rd party

How does block chain work?

The goal of a blockchain is to allow digital information to be recorded and distributed, but not edited

It is the foundation for unchanging ledgers, or records of transition that cannot be altered, deleted, or destroyed

  • As new data comes in, it is entered into a fresh block
    • Once the block is field with data, it chains on the previous block, which makes the data chain together in chronological order
  • Decentralized blockchains are immutable/unchanging
    • This means the data entered is irreversible
    • For bitcoin, this means transactions are permanently recorded and viewable to anyone

How can block chains be used for good and bad?

Good

  • Transparency and accountability
  • Financial inclusion
  • Improved data security and privacy
  • Efficient and transparent voting systems
  • Intellectual property rights protection

Bad

  • Transparency and accountability
  • Financial inclusion
  • Improved data security and privacy
  • Efficient and transparent voting systems
  • Intellectual property rights protection

Explanations(written by chatGPT our beloved):

Blockchain technology has the potential to be used for both positive and negative purposes. Here are examples of how blockchain can be used for good and bad:

Good:

  1. Transparency and accountability: Blockchain can enhance transparency by creating an immutable and auditable record of transactions or data. This can be particularly useful in areas such as supply chain management, where stakeholders can track the origin and journey of goods, ensuring ethical sourcing, fair labor practices, and quality control.
  2. Financial inclusion: Blockchain-based cryptocurrencies and decentralized finance (DeFi) platforms have the potential to provide financial services to the unbanked or underbanked populations. By enabling secure and low-cost transactions, blockchain technology can facilitate cross-border remittances and provide access to banking-like services, empowering individuals in economically underserved regions.
  3. Improved data security and privacy: Blockchain’s decentralized nature and cryptographic features can enhance data security and privacy. Personal information stored on a blockchain can be more resistant to hacking and unauthorized access. Additionally, blockchain-based identity management systems can provide individuals with control over their own data, allowing them to share information selectively and reducing reliance on centralized databases vulnerable to breaches.
  4. Efficient and transparent voting systems: Blockchain can facilitate secure and transparent voting systems by ensuring the integrity of votes and preventing tampering or fraud. Blockchain-based voting platforms can enhance trust in the electoral process, enable remote voting, and increase voter turnout.
  5. Intellectual property rights protection: Blockchain can be utilized to establish proof of ownership and protect intellectual property rights. Artists, writers, and inventors can use blockchain to establish a timestamped and verifiable record of their creations, providing evidence of originality and ownership.

Bad:

  1. Illicit activities: Blockchain’s pseudonymous and decentralized nature can be exploited for illicit activities such as money laundering, drug trafficking, and other forms of illegal transactions. Cryptocurrencies have been used in some cases to facilitate anonymous transactions, making it challenging for authorities to trace and investigate criminal activities.
  2. Privacy concerns: While blockchain offers enhanced data security, the transparency of certain blockchain implementations can compromise privacy. Public blockchains store transactions visible to all participants, potentially exposing sensitive information. This can be a concern when dealing with personal or confidential data.
  3. Energy consumption: Some blockchain networks, particularly those relying on proof-of-work consensus algorithms, consume substantial amounts of energy. The mining process used to validate and add new blocks to the chain requires significant computational power, contributing to carbon emissions and environmental impact.
  4. Regulatory challenges: The decentralized nature of blockchain poses regulatory challenges, particularly when it comes to issues like taxation, money laundering, and consumer protection. The lack of a centralized authority overseeing blockchain transactions can make it difficult for governments to enforce regulations and protect the interests of individuals and businesses.
  5. System vulnerabilities and smart contract risks: While blockchain technology itself is considered secure, the implementation and smart contracts built on top of it can be vulnerable to bugs, coding errors, and cyber attacks. Exploiting these vulnerabilities can lead to financial losses and disruption of blockchain-based systems.

It is crucial to recognize that the use and impact of blockchain technology depend on how it is implemented, regulated, and utilized by individuals and organizations. Responsible and ethical practices are necessary to maximize the potential positive outcomes while minimizing the negative implications.

Big Data

A term used to describe a large volume of data and is characterized by the 4 V’s

Volume (infinite source)

Consists of a huge volume of data to be considered as big data

  • Includes other sources such as sensor data, GPS signals and even photographs. There’s so much its almost infinite

Velocity (infinite source)

There are many types of data and formats available to collect.

  • Includes: social media posts, videos, photos, and document files. With so much variety we can also put this in the infinite category

Variety (finite source: independent from third parties and limited)

Defined by the accuracy of the data,

  • There is a limit to how truthful it can be.

Veracity (finite source: independent from third parties and limited)

High volume of information that has to be handled and for the speed at which the information has to be generated and managed.

Use of Big Data in Real World Settings

  • Used when a large volume of data is being processed to identify trends and patterns
  • Can be used to identify things such as learning about ways to streamline business logistics or figure out ways to improve revenue
  • Ex. Facebook noticed a trend of users shifting from using facebook as a social media perspective to a marketplace by reading user’s chat history (potential privacy violation?)
    • Noticed that people were adding each other and buying/selling goods to each other
    • Resulted in the reaction of Facebook marketplace to monetize it for themselves
  • Ex. Citibank developed a real-time learning and predictive mol system that uses the big data collected to detect potential fraudulent transactions

Data Dilemmas

Is the data collected ethical and legal?

Tech companies shouldnt be collecting excessive amounts of data, and consent from users needs to be obtained prior to collection of data

Are the data sets biased?

Company purposefully collect data geared toward making them look good

  • Ex. facebook news feed being filled with things that the users are more likely to read, resulting in bias rather than offering different perspectives.

Data privacy: the ability for individuals to control their personal information

  • Where is the data stored? Who has access to the data? What if my information gets leaked?
  • Companies should be charged with massive fines if they fail to guard their user data

Data Dilemmas Relating to Data Structure

Data Reliability

How reliable is the data?
Is the data complete and accurate?

Data integrity: the data’s trustworthiness and level of accuracy

Has the data been tampered with by external factors like viruses and malware?
Can I trust the data and its source?

Data Dilemmas Relating to Error

Outdated data?

The world is constantly changing and evolving. Hence “facts” from 20 years ago may no longer be valid in the current day

Human error and lack of precision

Manual data entry by humans is prone to error (ex. This doc)

  • Easy to delete data by accident (oopsies i deleted my google account and now this is all gone)

All these affect data integrity and data reliability