1) Define data analytics and data science. Are they similar or different? Give a reason.
Answer:
- Data Analytics is the process of examining past data to identify trends, patterns, and insights. It helps businesses make data-driven decisions.
- Data Science is a broader field that includes data analytics but also involves machine learning, artificial intelligence, and predictive modelling.
- Difference: Data analytics focuses on understanding what happened in the past, while data science also predicts future trends and automates decision-making.
Example:
A company analyses customer purchases (data analytics) to find the best-selling products, while data science predicts future sales trends based on past data.
2) Can you relate how data science is helpful in solving business problems?
Answer:
Data science plays a key role in business decision-making by analysing large amounts of data, identifying customer behaviour, predicting future trends, and automating processes. This helps businesses improve efficiency and customer satisfaction.
Example:
An e-commerce website like Daraz or Amazon uses data science to recommend products based on a customer's browsing history. This increases sales and improves user experience.
3) Database is useful in the field of data science. Defend this statement.
Answer:
A database is essential for data science because it stores large volumes of structured and unstructured data. Without a database, managing and analysing data would be difficult. It helps in data retrieval, processing, and efficient storage for analysis.
Example:
A hospital database stores patient records, including medical history, prescriptions, and test results. Data scientists can analyze this data to find patterns in diseases and improve treatments.
4) Compare machine learning and deep learning in the context of formal & informal education.
Answer:
- Machine Learning: A branch of AI where computers learn from data without being explicitly programmed. It is widely used in recommendation systems, predictive analytics, and automation.
- Deep Learning: A subset of machine learning that uses artificial neural networks to process complex data like images, speech, and text. It requires large datasets and powerful computing resources.
Example:
- In formal education, AI-powered tutoring systems analyze students' performance and suggest personalized study plans (machine learning).
- In informal education, YouTube recommends educational videos based on a user's past watch history (deep learning).
5) What is meant by sources of data? Give three sources of data excluding those mentioned in the book.
Answer:
Sources of data refer to the origins from which data is collected for analysis and decision-making. These sources can be structured (organized) or unstructured (raw).
Examples:
- Social Media: Data collected from user interactions, likes, shares, and comments on platforms like Facebook and Instagram.
- IoT Devices: Smartwatches, fitness bands, and home automation systems collect data about user activity and environment.
- Online Surveys: Businesses conduct surveys through Google Forms or other platforms to gather customer feedback.
6) Differentiate between database and dataset.
Answer:
- Database: A structured collection of data stored in tables. It can store a large amount of data in an organized manner.
- Dataset: A specific collection of data extracted from a database for analysis. It usually contains a limited set of relevant information.
Example:
A university database stores all student records, including names, roll numbers, and marks. A dataset may be created with only final-year students' marks for performance analysis.
7) Argue about the trends, outliers, and distribution of values in a dataset.
Answer:
- Trends: Patterns observed in data over time, such as increasing sales in summer.
- Outliers: Data points that are significantly different from others, such as one student scoring 100 in a test where most scored below 50.
- Distribution: The way data values are spread across a dataset, such as normal distribution, where most values are around the average.
Example:
A store finds that sales increase on weekends (trend), but one day had zero sales (outlier) due to a system issue. The distribution of sales shows that most days have sales between 50-100 items.
8) Why are summary statistics needed?
Answer:
Summary statistics provide a quick and easy way to understand key aspects of a dataset, such as the average, median, range, and standard deviation. These statistics help in decision-making and data interpretation.
Example:
A teacher wants to understand class performance. Instead of checking all 50 students' marks, they look at the average marks to get a general idea of how the class performed.
9) Express big data in your own words. Explain three V's of big data with reference to email data.
Answer:
Big data refers to extremely large datasets that cannot be processed using traditional methods due to their size and complexity. It requires advanced tools like AI and cloud computing.
Three V's of Big Data (related to emails):
- Volume: Millions of emails are sent and stored daily, creating a massive amount of data.
- Velocity: Emails are received in real-time at high speed.
- Variety: Emails contain text, images, attachments, and links, making data diverse.
Example:
Gmail processes billions of emails every day. It uses big data techniques to filter spam emails and recommend important messages.
10) Illustrate the purpose of data storage.
Answer:
Data storage allows users and businesses to save, retrieve, and manage information for future use. It ensures data security and easy access when needed.
Example:
Cloud storage services like Google Drive and Dropbox help users store important files, photos, and documents securely. This data can be accessed anytime from any device.