这里是 week 3-2，在了解了Multivariate Categorical Data之后，当然就是Multivariate Quantitative Data 啦
Start with Me | Coursera - Understanding and Visualizing Data with Python week 3-2 - Multivariate Quantitative Data and Quiz
Multivariate Quantitative Data
Gathering Multivariate Quantitative Data
What is your age?
Let's measure your BMI
Let's measure your blood pressure
Let's measure your cholesterol level
What is Multivariate Quantitative Data ?
more than one trait recorded per unit
takes on a measured numeric value
Recoding Multivariate Quantitative Data
Displaying with Univariate Histograms
Displaying with a Scatterplot
- A scatter plot graphs two quantitative variables together
- is advantage at the association between two variables
Association - Type
the pattern is a line
the pattern is parabolic
there is no pattern
Association - Direction
Positive linear association
pattern has a positive slope, when x increases, y increases
Negative linear association
pattern has a negative slope, when x increases, y decreases
Association - Strength
weak linear association
points are largely scattered along a line
moderate linear association
points are partially scattered along a line
strong linear association
points are minimally scattered along a line
number between -1 and 1 indicating the strength and sign of association between 2 variables
Correlation Does Not Imply Causation
Outliers in Multivariate Quantitative Data
Displaying Quantitative and Categorical Data
Quiz - Multivariate Data
A bicycle rental company has counted the number of bicycle rentals in each season (spring, summer, fall, winter) for the past two years.
Additionally, the company has collected weather data (temperature, wind speed and humidity).
Use the data for bicycle rentals and weather presented in the tables and graphs below to answer these practice quiz questions.
Which proportion describes the most popular season for renting bicycles in Year 1?
A. 15000 / 1243103
B. 641479 / 2049576
C. 419650 / 1243103
D. 1061129 / 3292679
Which proportion describes the least popular season for renting bicycles in Year 2?
A. 641479 / 2049576
B. 321348 / 471348
C. 2049576 / 3292679
D. 471348 / 3292679
E. 321348 / 2049576
Which statement best describes the meaning of 326,137 / 841,613?
- The proportion of Total rentals that occurred in Winter.
B. The proportion of Year 1 rentals that occurred in Winter.
C. The proportion of Total rentals that occurred in Year 1.
D. The proportion of Total Winter rentals that occurred in Year 1.
How do the proportion of rides in the Summer compare between Year 1 and Year 2?
A. The proportion is higher in Year 2 because 571,273 is larger than 347,316.
B. The proportion is higher in Year 1 because 2,049,576 is larger than 1,243,103.
C. Can't tell without doing additional calculations
The company suspects that they will have a larger increase in rentals from registered riders compared to non-registered riders over the two years. They make this bar chart to see how the numbers compare.
For which group does the increase in riders seem larger?
B. Can't tell from this graph
D. They look the same
What kind of graph is this?
A. Bar chart
B. Side-by-side bar chart
C. stacked bar chart
D. mosaic plot
The bicycle company is interested in knowing how rides are affected by various weather conditions. To start with, they want to examine the registered wind speeds (after a normalization).Is wind speed a discrete or continuous variable?
C. Can't tell
The company wants to consider how weather patterns affect the bicycle rentals. They first consider how the measured temperature compares to the apparent temperature, or the temperature that humans perceive it to be. The temperatures have been normalized to fall on a scale between 0 and 1.
Yesterday the normalized real temperature was 0.4. Today the normalized real temperature is 0.8. Which day would you expect to have a higher apparent temperature?
C. Can't tell
The scatterplot between temperature and apparent temperature is linear. What is the strength of the scatterplot between temperature and apparent temperature?
D. Can't tell
Eventually, the bicycle company wants to think about how bicycle rides vary based on weather. After looking at humidity, they think that the humidity might be associated with the general weather conditions. They consider weather situations of 1 = clear to partly cloudy, 2 = misty with no to some clouds, 3 = light rain and light snow, and 4 = heavy rain, snow, thunderstorms, and other extreme weather.
Based on the side-by-side boxplots below, which weather condition has the highest mean humidity?
A. 1 = clear to partly cloudy
B. 2 = misty with no to some clouds
C. 3 = light rain and light snow
D. 4 = heavy rain, snow, thunderstorms, and other extreme weather
E. Can't tell
We would need to compare two proportions. For this, we would need to calculate two separate proportions with two different numerators and denominators.
There are groupings of bars within this graph. This indicates that there are at least two groups being compared in this side-by-side bar chart.
Wind speed is a quantitative continuous variable.
Because the scatterplot has a positive direction, we expect the apparent temperature to be larger when the actual temperature is larger.
Almost all of the point are very close to a line, so we consider the strength of the linear relationship to be strong.
A mean is not shown in boxplots, although the median is.