---
title: IBM - What is Data Science
tags: Data Science
description: ibm data science certificate
---
# **IBM Data science (W1)**
## what is Data Science
- A process of using data to understand different things.
- The art of uncovering the insights and trends that are hiding behind data. It's when you translate data into a story
- I'd see data science as one's attempt to work with data, to find answers to questions that they are exploring.
## Fundamentals of Data Science
- ==Data scientists== use data analysis to add to the knowledge of the organization by investigating data, exploring the best way to use it to provide value to the business.
- ==Data scientists== can analyze structured and unstructured data from many sources, and depending on the nature of the problem, they can choose to analyze the data in different ways.
- ==Data scientists== can use powerful data visualization tools to help stakeholders understand the nature of the results, and the recommended action to take.
## The Many Paths to Data Science
> I went through many different stages in my life where I wanted to be a singer and then a doctor. And then I realized that I was good at math.[name=Diana Zarate]
## Defining Data Science
**Aspiring data scientist require:**
- **Curious**
-- If you're not curious, you would not know what to do with the data.
- **Argumentative**
-- if you can argument and if you can plead a case, at least you can start somewhere and then you learn from data and then you modify your assumptions and hypotheses and your data would help you learn
- **Judgmental**
-- if you do not have preconceived notions about things you wouldn't know where to begin with.
- ==**<font color='#f00'>Ability to tell a story</font>**==
-- once you have your analytics, once you have your tabulations, now you should be able to tell a great story from it.
> *Your rise to prominence is pretty much relying on your ability to tell great stories*
:::info
Your competitive advantage is your understanding of some aspect of life where you exceed beyond others in understanding that. Once you've figured out where your expertise lies, then you start acquiring analytical skills. What platforms to learn and those platforms, those tools would be specific to the industry that you're interested in. And then once you have got some proficiency in the tools, the next thing would be to apply your skills to real problems, and then tell the rest of the world what you can do with it.
:::
---
## What Do Data Scientists Do?
**How do you get a better solution that is efficient?**
- Identify the problem and establish a clear understanding of it.
- Gather the data for analysis.
- Identify the right tools to use, and develop a data strategy.
>Case studies are also helpful in customizing a potential solution. Once these conditions exist and available data is extracted, you can develop a machine learning model.
## Data Science Topics and Algorithms
Using complicated machine learning algorithms does not always guarantee achieving a better performance. Occasionally, a simple algorithm such as k-nearest neighbor can yield a satisfactory performance comparable to the one achieved using a complicated algorithm. It all depends on the data.
### Lets explain regression in the simplest possible terms:
:::info
If you have ever **taken a cab ride**, a taxi ride, you understand regression. Here is how it works. The moment you sit in a cab ride, in a cab, you see that there's a fixed amount there. It says $2.50. You, rather the cab, moves or you get off. This is what you owe to the driver **the moment you step into a cab**. That's a **constant**. You have to pay that amount if you have stepped into a cab. Then as it starts moving for **every meter or hundred meters the fare increases by certain amount**. So there's a... there's a fraction, **there's a relationship between distance and the amount you would pay above and beyond that constant**. And if you're not moving and you're stuck in traffic, then every additional minute you have to pay more. So as the **minutes increase, your fare increases**. As the **distance increases, your fare increases**. And while all this is happening you've already paid a base fare which is the constant. This is what regression is. **Regression tells you what the base fare is and what is the relationship between time and the fare you have paid, and the distance you have traveled and the fare you've paid**. Because in the absence of knowing those relationships, and just knowing how much people traveled for and how much they paid, regression allows you to compute that constant that you didn't know. That it was $2.50, and it would compute the relationship between the fare and and the distance and the fare and the time. That is regression.
:::
---
## Cloud for Data Science
- It allows you to ==bypass== the physical limitations of the computers and the systems you're using and it allows you to ==deploy== the analytics and storage capacities of advanced machines that do not necessarily have to be your machine or your company's machine.
> bypass 分流 deploy 部署
- ==Multiple collaborators== or teams can access the data simultaneously, working together on producing a solution
- Apache Spark without the need to install and ==configure== them locally.
> configure 配置
---
## What Makes Someone a Data Scientist?
- Despite their ubiquitous use, consensus evades the notions Of big data and data science.
> ubiquitous 無處不在 consensus 共識 evade 迴避 notion 觀念
- being contested by individuals
> contested 爭議的
- interested in protecting their discipline or academic turfs
> turf 地盤
- Why a narrowly construed definition Of either big data
> construed 解釋
- love for number crunching to a quote
> number-crunching 大量處理
- the diversity Of opinion on these answers borders on hostility
- I do not use the data size as a restrictive clause. A data below a certain arbitrary threshold does not make one less Of a data scientist
- I wrote my master's thesis on forecasting housing prices and my doctoral dissertation on forecasting homebuilders' choices related to What they build
- They might not be designing new circuitry, distillation equipment
- offers certain thresholds
- Suffice it to say that the thriving academic community
:::warning
https://learner.coursera.help/hc/en-us/articles/360036160591
:::
## Data Science: The Sexiest Job in the 21st Century
[**Reading Paper**](https://drive.google.com/file/d/1HQaDddmvZnfgKI6ojfJU7jQsj7QBp3B4/view?usp=sharing)
## What Makes Someone a Data Scientist?
[**Reading Paper**](https://drive.google.com/file/d/1ew_6yBRIqthqCst-XAgN28MOv18AUIUt/view?usp=sharing)
# **Big Data and Data Mining (W2)**
## Foundations of Big Data
:::info
**Definition:**
Big Data refers to the dynamic, large and disparate volumes of data being created by people, tools, and machines. It requires new, innovative, and scalable technology to collect, host, and analytically process the vast amount of data gathered in order to derive real-time business insights that relate to consumers, risk, profit, performance, productivity management, and enhanced shareholder value
:::
### Common Elements:
- **Velocity**
-- Speed at which data accumulates.
- **Volume**
-- The scale of the data, or the increase in the amount of data stored
>Store data -- approximately <font color='red'>**2.5**</font> quintillion bytes every day
- **Variety**
-- The diversity of the data
- **Veracity**
-- The quality and origin of data, and its conformity to facts and accuracy. Attributes include consistency, completeness, integrity, and ambiguity
> <font color='red'>**80%**</font> of data is considered to be unstructured
- **Value**
-- Ability and need to turn data into value
The scale of the data being collected means that it’s not feasible to use conventional data analysis tools
> feasible 可行的 conventional 傳統的
---
## What is Hadoop?
:::warning
**Traditionally in computation and processing data we would bring the data to the computer.** You'd wanna program and you'd bring the data into the program.
**In a big data cluster** what Larry Page and Sergey Brin came up with is very pretty simple is they **took the data and they sliced it into pieces** and they distributed each and they replicated each piece or triplicated each piece and they would **send it the pieces of these files to thousands of computers**.
And then they would **send the same program to all these computers in the cluster. And each computer would run the program on its little piece of the file and send the results back**. The results would then be sorted and those results would then be redistributed back to another process.
:::
- The first one is <font color =red>**a map or a mapper process**</font>
- The second one is <font color =red>**a reduce process**</font>
## How Big Data is Driving Digital Transformation
Digital Transformation affects business operations, updating existing processes and operations and creating new ones to harness the benefits of new technologies.
:::warning
In basketball, Big Data changed the way teams try to win, transforming the approach to the game.
In 2018, the Houston Rockets, they analyzed video tracking data to investigate which plays provided the best opportunities for high scores. **Data analysis revealed that the shots that provide the best opportunities for high scores are two-point dunks from inside the two-point zone, and three-point shots from outside the three-point line, not long-range two-point shots from inside it.**
:::
## Data Science Skills & Big Data
> I'm sort of a techy, geeky kind of person. I really like to play with technology in my spare time.
## Data Mining
[**Data Mining Paper**](https://drive.google.com/file/d/1GrLGFIOxH2zWKp0Cb0iVkkeJ1s3m2ElQ/view?usp=sharing)
## **Deep Learning and Machine Learning**
## What's the difference?
Big data is often described in terms of <font color =red>five V's</font>
- Velocity
- Volume
- Variety
- Veracity
- Value
**Data mining**
> the process of automatically searching and analyzing data, discovering previously unrevealed patterns.
- preprocessing the data to prepare it
- transforming it into an appropriate format
- simple data visualization tools to machine learning and statistical models
**Machine learning:**
> A subset of AI that uses computer algorithms to analyze data and make intelligent decisions based on what it is learned without being explicitly programmed
:::warning
Machine learning algorithms are trained with large sets of data and they learn from examples. They do not follow rules-based algorithms
:::
Machine learning is what enables machines to solve problems on their own and make accurate predictions using the provided data.
**Deep learning:**
> A specialized subset of machine learning that uses layered neural networks to simulate human decision-making
:::warning
Deep learning algorithms can label and categorize information and identify patterns.
:::
It is what enables AI systems to continuously learn on the job and improve the quality and accuracy of results by determining whether decisions were correct.
**Artificial neural networks:**
> A collection of small computing units called neurons that take incoming data and learn to make decisions over time.
Neural networks are often layer-deep and are the reason deep learning algorithms become more efficient as the data sets increase in volume, as opposed to other machine learning algorithms that may plateau as data increases.
:::warning
**Difference between Artificial Intelligence and Data Science:**
Data Science is the process and method for extracting knowledge and insights from large volumes of disparate data.
:::
## Neural Networks and Deep Learning
A neural network is trying to use computer, a computer program that will mimic how neurons, how our brains use neurons to process thing, neurons and synapses and building these complex networks that can be trained.
> algebra 代數 mimic 模仿
## Regression
[**Regression Paper**](https://drive.google.com/file/d/1Cdz_baA1u49d8lHSAlI1aVxL4WcyWKg7/view?usp=sharing)
---
# **Data Science in Business (W3)**
## How Can Someone Become a Data Scientist?
- you need to know how to program
- know some algebra
- some basic statistics
> **I do a lot of self-learning.** I think everybody these days, I mean, **I learned about Hadoop all by myself, I read some articles, I watched some videos, I thought, I played**, although I'm a builder, I'm a tinkerer, so if I wanna figure out how to do something, I build it.
So I think **one of the ways you learn things is you do them, you have to do them**, and these online learning platforms especially now that we have things like IPython and Jupyter Notebooks and I guess Zeppelin means that **you can actually go in** and take some of these courses and **you can do things right then and you can see them and feel them and play with them** and, at that point, you know, you'll start to get your head around what is actually happening.
## Recruiting for Data Science
- If you want to ==work in the **traditional market** research data, **structure data environment**==, your skills should be some <font color=red>statistical knowledge</font> and some knowledge of basic statistical algorithms, maybe some machine learning algorithms.
- If you want to ==work in **big data**==, then there's the other aspect of it and that is to be able to store data. So you start with the expertise in storing large amounts of data.
Three-step process:
1. Look into platforms that allow you to do that
2. Be able to manipulate large amounts of data
3. To apply algorithms to those large sets of data
- you want to be in the traditional predictive analytics environment, and you're not working with big data, then R or Stata, or Python would be your tools.
- If you're working mostly with unstructured data, then Python is most suitable than R.
- If you're working with big data, then Hadoop and Spark are the environments that you will be working with.
:::info
When you have a grading side, when you're presenting your results, imagine you're driving on a mountain and then there's a sharp turn. And you can't see what's beyond the turn. And then you make that turn and then suddenly, you see a tremendous valley in front of you. And this great sense of awe, that I didn't know that, right? So when you present your findings and you have this great finding and you communicate it well, this is what people feel because they were not expecting it. They were not aware of it, and then this great sense of happiness that now I know. And I didn't know this, now I know. And then it empowers them, it gives them ideas what they can do with this knowledge, this new insight. It's a great sense of joy.
:::
## The Report Structure
[**Report Structure**](https://drive.google.com/file/d/1kG5ag0OFYs8LtwWmUJxJOwMFruIs1RD4/view?usp=sharing)
## Final Assignments