变量的类型有哪些呢,数值变量还是分类变量,连续型还是离散型,名义还是有序

Start with Me | Coursera - Understanding and Visualizing Data with Python week 1-3 - Variable Types

终于来更新啦,这篇大概才是这周课程的重点了呜呜呜

介绍了两种不同的数据类型,每个数据类型又有两个子类型

如何判断不同变量属于什么类型呢?

那就Start with Me

听说,不同类型的变量可能会改变你对数据的处理方式 ## NHANES Data

NHANES : National Health and Nutrition Examination Survey

NHANES aims to assess the health and nutrition of children and adult in the US

  • MBI

  • Race

  • Age

  • Adult

Could we reasonably compute the average response for each of these two variables (BMI and Race) ?

  • BMI - Yes
  • Race - No

例子:NHANES 数据集

NHANES:国家健康和营养调查,目的:评估美国儿童和成年人的健康和营养

在如上表格中,显示了五个不同的个体,他们有着不同的 ID,有四个相应的变量

  • BMI,身高与体重比例的测量
  • Race,种族
    • 1:墨西哥裔美国人
    • 2:其他西班牙裔
    • 3:非西班牙裔白人
    • 4:非西班牙裔黑人
    • 5:其他
  • Age,年龄
  • Adult,成人指标变量
    • 1:≥ 18岁
    • 0:<18岁

对于 BMI 和 Race 变量,能否合理地分别计算出这两个变量的平均值?

  • 对 BMI 来说,求平均值是合理的,可以得到样本人群的平均 BMI
  • 对 Race 来说,求平均值没有直观的意义
    • 每个值对于不同的种族类别都有不同的编码,如果取平均值,比如2.7,那这个值并不能提供有意义的信息
    • 但是有一些分析方法喜欢对于分类变量 Race 取平均值

Quantitative Variables

Numerical, measurable quantities in which arithmetic operations often make sense

  • Continuous

    could take on any value within an interval many possible values

  • Discrete

    countable value, finite number of values

定量变量 / 数值变量

数值变量是一个可测量的数字,其中的算术运算往往是有意义的,如 BMI

数值变量有两种不同的类型

  • 连续型

    一个连续的数值变量可以在一个区间内取任何值,可以是很多很多可能的值,如 BMI、身高、体重、任何与时间有关的(一个人跑100m需要的时间、完成课程作业的时间)

  • 离散型

    是一组有限的可计数的数字,如:一个家庭中的孩子数量(可以是1、2、3,但不能是2.3)

    离散的必须是一些设定好的可数的值

Categorical / Qualitative Variables

Classifies individuals or items into different groups

  • Ordinal

    groups have an order or ranking

  • Nominal

    groups are merely names, no ranking

定性变量 / 分类变量

分类变量是将选项分为不同的组别,如 种族

分类变量有两种不同的类型

  • 有序 / 等级

    是有某种顺序或排名与之相关,如 大学年级可以分为大一、大二、大三、大四,通常按照这个顺序自然排名

  • 无序 / 名义

    没有任何与之相关的排序,如 种族、婚姻状况,在这个变量中移动不同的类别,意义不会改变

Quiz

The age of the individual was recorded at the time of the survey. What type of variable would age be considered?

A. Categorical Nominal

B. Categorical Ordinal

C. Quantitative Continuous

D. Quantitative Discrete

年龄是什么变量呢?

不会就选 C

有时会把年龄看作离散的数值变量,但是有时候你问小朋友你几岁了,他会回答你 “我三岁半了”,所以当把年龄看作关于时间的值,就是一个连续变量啦

The adult indicator variable is coded as a 1 if the individual is 18 years of age or older and a 0 if not. What type of variable would the adult indicator variable be considered?

A. Categorical Nominal

B. Categorical Ordinal

C. Quantitative Continuous

D. Quantitative Discrete

成人指标变量是什么变量呢?

如果你是未成年人就编码0,是成年人就编码1,在这里只是NHANES选择将其编码为0/1,我们也可以将其编码为M/A,那当然是名义分类变量啦

Practice Quiz

The following five questions are focused around a startup company. Management is trying to gain insight on current employees and have sent out a short anonymous survey to them. Read each prompt carefully and determine the variable type for each question.

  1. Employees were asked to report their typical daily commute time, in minutes. What type of variable would their response be considered?

通勤时间是什么变量呢?

与时间有关,连续数值变量,Quantitative Continuous

  1. Employees were asked to report their typical daily mode of transportation to and from work (i.e. Car, Bike, Bus, etc.). What type of variable would their response be considered?

交通方式(汽车、自行车、公交车)是什么变量呢?

不同类别,无序,名义分类变量,Categorical Nominal

  1. The company wanted to know how employees perceived the work of upper management. Employees were asked to report the satisfaction of upper management using a 1 to 5 scale (with the following representations: 1 - Extremely Unsatisfied, 2 - Unsatisfied, 3 - Neutral, 4 - Satisfied, 5 - Extremely Satisfied). What type of variable would their response be considered?

满意度(1:非常不满意 ... 5:非常满意)是什么变量呢?

不同类别,按照一定顺序,有序,有序分类变量,Categorical Ordinal

  1. It was reported that Fridays were generally lighter in terms of number of meetings held. Employees were asked to report the number of scheduled meetings they attended the previous Friday. What type of variable would their response be considered?

会议数量属于什么变量呢?

会议数量不会出现小数,离散型数值变量,Quantitative Discrete

  1. Management was playing around with the idea of having a food truck visit the office once a week and was trying to gauge how much employees would spend to help entice various food truck owners. Employees were asked to report the amount of money they believe they would spend on lunch (in $XX.XX) if a food truck came to the office once a week. What type of variable would their response be considered?

午饭的花费( $XX.XX)属于什么变量呢?

费用,在一定区间可以取任何值,连续型数值变量,Quantitative Continuous

恭喜!您通过了!

Assessment - 10min

The following five questions are focused around a public library. Staff members are trying to gain insight on current library card holders and have sent out a short survey to them. Read each prompt carefully and determine the variable type for each question.

  1. Library card holders were asked whether or not they have checked out a book from the library in the past month (yes or no). What type of variable would their response be considered?
  1. Library card holders were asked to report the amount of late fees they have been charged in the past year (input in the form of $XX.XX). What type of variable would their response be considered?
  1. Library card holders were asked to reflect on their most recent book they checked out and report the genre that it most closely represented (i.e. Science Fiction, Action, Romance, Mystery, etc.). What type of variable would their response be considered?
  1. The library recently added a new online checkout/renewal system. Library card holders were asked how many times they have used the new online system. What type of variable would their response be considered?
  1. Library card holders were asked to report the satisfaction of their library experience during their last visit using a 1 to 5 scale (with the following representations: 1 - Extremely Unsatisfied, 2 - Unsatisfied, 3 - Neutral, 4 - Satisfied, 5 - Extremely Satisfied). What type of variable would their response be considered?

请查看参考答案 1. Categorical Nominal

  1. Quantitative Continuous

  2. Categorical Nominal

  3. Quantitative Discrete

  4. Categorical Ordinal

恭喜!您通过了!成绩100%