{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Pandas包介绍\n",
"\n",
"[Pandas官方文档(英文)](https://pandas.pydata.org/pandas-docs/stable/)\n",
"\n",
"[Pandas文档(中文)](https://pypandas.cn/docs/)\n",
"\n",
"Pandas是一个开源数据分析和数据处理的Python库。它提供了大量便捷的数据结构和数据分析工具,是Python编程语言中用于数据挖掘和数据分析的重要工具之一。\n",
"\n",
"## DataFrame\n",
"\n",
"DataFrame是pandas库中的一种二维标签化数据结构,类似于Excel 的表格或SQL数据库中的表。它是数据分析和处理的基础单元,能够存储多种类型的数据,并提供丰富的函数和方法进行数据操作。\n",
"\n",
"\n",
"\n",
"DataFrame的每一列则是一个Series,同样提供了丰富的函数和方法进行数据操作\n",
"\n",
""
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" Name Age Sex\n",
"0 Braund, Mr. Owen Harris 22 male\n",
"1 Allen, Mr. William Henry 35 male\n",
"2 Bonnell, Miss. Elizabeth 58 female\n",
"\n",
"0 22\n",
"1 35\n",
"2 58\n",
"Name: Age, dtype: int64\n",
"\n",
"58\n"
]
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.DataFrame(\n",
" {\n",
" \"Name\": [\n",
" \"Braund, Mr. Owen Harris\",\n",
" \"Allen, Mr. William Henry\",\n",
" \"Bonnell, Miss. Elizabeth\",\n",
" ],\n",
" \"Age\": [22, 35, 58],\n",
" \"Sex\": [\"male\", \"male\", \"female\"],\n",
" }\n",
")\n",
"\n",
"print(df)\n",
"print()\n",
"print(df[\"Age\"])\n",
"print()\n",
"print(df[\"Age\"].max())"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Age | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 3.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 38.333333 | \n",
"
\n",
" \n",
" std | \n",
" 18.230012 | \n",
"
\n",
" \n",
" min | \n",
" 22.000000 | \n",
"
\n",
" \n",
" 25% | \n",
" 28.500000 | \n",
"
\n",
" \n",
" 50% | \n",
" 35.000000 | \n",
"
\n",
" \n",
" 75% | \n",
" 46.500000 | \n",
"
\n",
" \n",
" max | \n",
" 58.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Age\n",
"count 3.000000\n",
"mean 38.333333\n",
"std 18.230012\n",
"min 22.000000\n",
"25% 28.500000\n",
"50% 35.000000\n",
"75% 46.500000\n",
"max 58.000000"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.DataFrame(\n",
" {\n",
" \"Name\": [\n",
" \"Braund, Mr. Owen Harris\",\n",
" \"Allen, Mr. William Henry\",\n",
" \"Bonnell, Miss. Elizabeth\",\n",
" ],\n",
" \"Age\": [22, 35, 58],\n",
" \"Sex\": [\"male\", \"male\", \"female\"],\n",
" }\n",
")\n",
"\n",
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 读取文件\n",
"pandas库提供了丰富的函数和方法用于读取各种类型的数据文件,如CSV、Excel、JSON、SQL数据库等。这些函数通常返回一个 DataFrame 对象,方便进行后续的数据分析和处理。"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sepal_length | \n",
" sepal_width | \n",
" petal_length | \n",
" petal_width | \n",
" species | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 5.1 | \n",
" 3.5 | \n",
" 1.4 | \n",
" 0.2 | \n",
" setosa | \n",
"
\n",
" \n",
" 1 | \n",
" 4.9 | \n",
" 3.0 | \n",
" 1.4 | \n",
" 0.2 | \n",
" setosa | \n",
"
\n",
" \n",
" 2 | \n",
" 4.7 | \n",
" 3.2 | \n",
" 1.3 | \n",
" 0.2 | \n",
" setosa | \n",
"
\n",
" \n",
" 3 | \n",
" 4.6 | \n",
" 3.1 | \n",
" 1.5 | \n",
" 0.2 | \n",
" setosa | \n",
"
\n",
" \n",
" 4 | \n",
" 5.0 | \n",
" 3.6 | \n",
" 1.4 | \n",
" 0.2 | \n",
" setosa | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width species\n",
"0 5.1 3.5 1.4 0.2 setosa\n",
"1 4.9 3.0 1.4 0.2 setosa\n",
"2 4.7 3.2 1.3 0.2 setosa\n",
"3 4.6 3.1 1.5 0.2 setosa\n",
"4 5.0 3.6 1.4 0.2 setosa"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.read_csv(\"iris.csv\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"sepal_length float64\n",
"sepal_width float64\n",
"petal_length float64\n",
"petal_width float64\n",
"species object\n",
"dtype: object"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.read_csv(\"iris.csv\")\n",
"df.dtypes"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" sepal_length | \n",
" sepal_width | \n",
" petal_length | \n",
" petal_width | \n",
"
\n",
" \n",
" \n",
" \n",
" count | \n",
" 150.000000 | \n",
" 150.000000 | \n",
" 150.000000 | \n",
" 150.000000 | \n",
"
\n",
" \n",
" mean | \n",
" 5.843333 | \n",
" 3.054000 | \n",
" 3.758667 | \n",
" 1.198667 | \n",
"
\n",
" \n",
" std | \n",
" 0.828066 | \n",
" 0.433594 | \n",
" 1.764420 | \n",
" 0.763161 | \n",
"
\n",
" \n",
" min | \n",
" 4.300000 | \n",
" 2.000000 | \n",
" 1.000000 | \n",
" 0.100000 | \n",
"
\n",
" \n",
" 25% | \n",
" 5.100000 | \n",
" 2.800000 | \n",
" 1.600000 | \n",
" 0.300000 | \n",
"
\n",
" \n",
" 50% | \n",
" 5.800000 | \n",
" 3.000000 | \n",
" 4.350000 | \n",
" 1.300000 | \n",
"
\n",
" \n",
" 75% | \n",
" 6.400000 | \n",
" 3.300000 | \n",
" 5.100000 | \n",
" 1.800000 | \n",
"
\n",
" \n",
" max | \n",
" 7.900000 | \n",
" 4.400000 | \n",
" 6.900000 | \n",
" 2.500000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" sepal_length sepal_width petal_length petal_width\n",
"count 150.000000 150.000000 150.000000 150.000000\n",
"mean 5.843333 3.054000 3.758667 1.198667\n",
"std 0.828066 0.433594 1.764420 0.763161\n",
"min 4.300000 2.000000 1.000000 0.100000\n",
"25% 5.100000 2.800000 1.600000 0.300000\n",
"50% 5.800000 3.000000 4.350000 1.300000\n",
"75% 6.400000 3.300000 5.100000 1.800000\n",
"max 7.900000 4.400000 6.900000 2.500000"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"df = pd.read_csv(\"iris.csv\")\n",
"df.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 选取子集\n",
"\n",
"我们将以泰坦尼克号数据集为例,展示如何选取DataFrame中的子集。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" Braund, Mr. Owen Harris | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" Heikkinen, Miss. Laina | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"比如,我对于乘客们的年龄感兴趣,我们可以通过下面的方式提取出年龄这一列"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 22.0\n",
"1 38.0\n",
"2 26.0\n",
"3 35.0\n",
"4 35.0\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"ages = titanic[\"Age\"] #选择年龄列\n",
"ages.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果我们对其中两列感兴趣,我们可以这样操作"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Age | \n",
" Sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 22.0 | \n",
" male | \n",
"
\n",
" \n",
" 1 | \n",
" 38.0 | \n",
" female | \n",
"
\n",
" \n",
" 2 | \n",
" 26.0 | \n",
" female | \n",
"
\n",
" \n",
" 3 | \n",
" 35.0 | \n",
" female | \n",
"
\n",
" \n",
" 4 | \n",
" 35.0 | \n",
" male | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Age Sex\n",
"0 22.0 male\n",
"1 38.0 female\n",
"2 26.0 female\n",
"3 35.0 female\n",
"4 35.0 male"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"age_sex = titanic[[\"Age\", \"Sex\"]] #选择年龄和性别两列\n",
"age_sex.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果我们想筛选出,乘客中的所有男性的数据,也就是筛选出表中的某些行,我们可以这样操作"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" Braund, Mr. Owen Harris | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 5 | \n",
" 6 | \n",
" 0 | \n",
" 3 | \n",
" Moran, Mr. James | \n",
" male | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 330877 | \n",
" 8.4583 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
" 6 | \n",
" 7 | \n",
" 0 | \n",
" 1 | \n",
" McCarthy, Mr. Timothy J | \n",
" male | \n",
" 54.0 | \n",
" 0 | \n",
" 0 | \n",
" 17463 | \n",
" 51.8625 | \n",
" E46 | \n",
" S | \n",
"
\n",
" \n",
" 7 | \n",
" 8 | \n",
" 0 | \n",
" 3 | \n",
" Palsson, Master. Gosta Leonard | \n",
" male | \n",
" 2.0 | \n",
" 3 | \n",
" 1 | \n",
" 349909 | \n",
" 21.0750 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass Name Sex Age \\\n",
"0 1 0 3 Braund, Mr. Owen Harris male 22.0 \n",
"4 5 0 3 Allen, Mr. William Henry male 35.0 \n",
"5 6 0 3 Moran, Mr. James male NaN \n",
"6 7 0 1 McCarthy, Mr. Timothy J male 54.0 \n",
"7 8 0 3 Palsson, Master. Gosta Leonard male 2.0 \n",
"\n",
" SibSp Parch Ticket Fare Cabin Embarked \n",
"0 1 0 A/5 21171 7.2500 NaN S \n",
"4 0 0 373450 8.0500 NaN S \n",
"5 0 0 330877 8.4583 NaN Q \n",
"6 0 0 17463 51.8625 E46 S \n",
"7 3 1 349909 21.0750 NaN S "
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"male_all = titanic[titanic[\"Sex\"] == 'male']\n",
"male_all.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"同理,我们可以筛选出所有年龄大于35岁的客人"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 6 | \n",
" 7 | \n",
" 0 | \n",
" 1 | \n",
" McCarthy, Mr. Timothy J | \n",
" male | \n",
" 54.0 | \n",
" 0 | \n",
" 0 | \n",
" 17463 | \n",
" 51.8625 | \n",
" E46 | \n",
" S | \n",
"
\n",
" \n",
" 11 | \n",
" 12 | \n",
" 1 | \n",
" 1 | \n",
" Bonnell, Miss. Elizabeth | \n",
" female | \n",
" 58.0 | \n",
" 0 | \n",
" 0 | \n",
" 113783 | \n",
" 26.5500 | \n",
" C103 | \n",
" S | \n",
"
\n",
" \n",
" 13 | \n",
" 14 | \n",
" 0 | \n",
" 3 | \n",
" Andersson, Mr. Anders Johan | \n",
" male | \n",
" 39.0 | \n",
" 1 | \n",
" 5 | \n",
" 347082 | \n",
" 31.2750 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 15 | \n",
" 16 | \n",
" 1 | \n",
" 2 | \n",
" Hewlett, Mrs. (Mary D Kingcome) | \n",
" female | \n",
" 55.0 | \n",
" 0 | \n",
" 0 | \n",
" 248706 | \n",
" 16.0000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"1 2 1 1 \n",
"6 7 0 1 \n",
"11 12 1 1 \n",
"13 14 0 3 \n",
"15 16 1 2 \n",
"\n",
" Name Sex Age SibSp \\\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"6 McCarthy, Mr. Timothy J male 54.0 0 \n",
"11 Bonnell, Miss. Elizabeth female 58.0 0 \n",
"13 Andersson, Mr. Anders Johan male 39.0 1 \n",
"15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"1 0 PC 17599 71.2833 C85 C \n",
"6 0 17463 51.8625 E46 S \n",
"11 0 113783 26.5500 C103 S \n",
"13 5 347082 31.2750 NaN S \n",
"15 0 248706 16.0000 NaN S "
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"above_35 = titanic[titanic[\"Age\"] > 35]\n",
"above_35.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们尝试拆分这个过程"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 False\n",
"1 True\n",
"2 False\n",
"3 False\n",
"4 False\n",
" ... \n",
"886 False\n",
"887 False\n",
"888 False\n",
"889 False\n",
"890 False\n",
"Name: Age, Length: 891, dtype: bool"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic[\"Age\"] > 35"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们可以看到,`titanic[\"Age\"] > 35`实际上生成了一个元素类型为bool的Series,而这个Series可以做为索引,True所对应的行将会被选择。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果我们想要,选取特定行以及特定列的数据,我们需要用到`loc`方法"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1 Cumings, Mrs. John Bradley (Florence Briggs Th...\n",
"6 McCarthy, Mr. Timothy J\n",
"11 Bonnell, Miss. Elizabeth\n",
"13 Andersson, Mr. Anders Johan\n",
"15 Hewlett, Mrs. (Mary D Kingcome) \n",
"Name: Name, dtype: object"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"adult_names = titanic.loc[titanic[\"Age\"] > 35, \"Name\"]\n",
"adult_names.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"也可以通过数值来筛选,这需要用到`iloc`方法"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 9 | \n",
" 2 | \n",
" Nasser, Mrs. Nicholas (Adele Achem) | \n",
" female | \n",
"
\n",
" \n",
" 10 | \n",
" 3 | \n",
" Sandstrom, Miss. Marguerite Rut | \n",
" female | \n",
"
\n",
" \n",
" 11 | \n",
" 1 | \n",
" Bonnell, Miss. Elizabeth | \n",
" female | \n",
"
\n",
" \n",
" 12 | \n",
" 3 | \n",
" Saundercock, Mr. William Henry | \n",
" male | \n",
"
\n",
" \n",
" 13 | \n",
" 3 | \n",
" Andersson, Mr. Anders Johan | \n",
" male | \n",
"
\n",
" \n",
" 14 | \n",
" 3 | \n",
" Vestrom, Miss. Hulda Amanda Adolfina | \n",
" female | \n",
"
\n",
" \n",
" 15 | \n",
" 2 | \n",
" Hewlett, Mrs. (Mary D Kingcome) | \n",
" female | \n",
"
\n",
" \n",
" 16 | \n",
" 3 | \n",
" Rice, Master. Eugene | \n",
" male | \n",
"
\n",
" \n",
" 17 | \n",
" 2 | \n",
" Williams, Mr. Charles Eugene | \n",
" male | \n",
"
\n",
" \n",
" 18 | \n",
" 3 | \n",
" Vander Planke, Mrs. Julius (Emelia Maria Vande... | \n",
" female | \n",
"
\n",
" \n",
" 19 | \n",
" 3 | \n",
" Masselmani, Mrs. Fatima | \n",
" female | \n",
"
\n",
" \n",
" 20 | \n",
" 2 | \n",
" Fynney, Mr. Joseph J | \n",
" male | \n",
"
\n",
" \n",
" 21 | \n",
" 2 | \n",
" Beesley, Mr. Lawrence | \n",
" male | \n",
"
\n",
" \n",
" 22 | \n",
" 3 | \n",
" McGowan, Miss. Anna \"Annie\" | \n",
" female | \n",
"
\n",
" \n",
" 23 | \n",
" 1 | \n",
" Sloper, Mr. William Thompson | \n",
" male | \n",
"
\n",
" \n",
" 24 | \n",
" 3 | \n",
" Palsson, Miss. Torborg Danira | \n",
" female | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Pclass Name Sex\n",
"9 2 Nasser, Mrs. Nicholas (Adele Achem) female\n",
"10 3 Sandstrom, Miss. Marguerite Rut female\n",
"11 1 Bonnell, Miss. Elizabeth female\n",
"12 3 Saundercock, Mr. William Henry male\n",
"13 3 Andersson, Mr. Anders Johan male\n",
"14 3 Vestrom, Miss. Hulda Amanda Adolfina female\n",
"15 2 Hewlett, Mrs. (Mary D Kingcome) female\n",
"16 3 Rice, Master. Eugene male\n",
"17 2 Williams, Mr. Charles Eugene male\n",
"18 3 Vander Planke, Mrs. Julius (Emelia Maria Vande... female\n",
"19 3 Masselmani, Mrs. Fatima female\n",
"20 2 Fynney, Mr. Joseph J male\n",
"21 2 Beesley, Mr. Lawrence male\n",
"22 3 McGowan, Miss. Anna \"Annie\" female\n",
"23 1 Sloper, Mr. William Thompson male\n",
"24 3 Palsson, Miss. Torborg Danira female"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic.iloc[9:25, 2:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"`iloc`和`loc`这也可以用于修改其中的元素"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" anonymous | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" anonymous | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" anonymous | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp Parch \\\n",
"0 anonymous male 22.0 1 0 \n",
"1 anonymous female 38.0 1 0 \n",
"2 anonymous female 26.0 0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 \n",
"4 Allen, Mr. William Henry male 35.0 0 0 \n",
"\n",
" Ticket Fare Cabin Embarked \n",
"0 A/5 21171 7.2500 NaN S \n",
"1 PC 17599 71.2833 C85 C \n",
"2 STON/O2. 3101282 7.9250 NaN S \n",
"3 113803 53.1000 C123 S \n",
"4 373450 8.0500 NaN S "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic.iloc[0:3, 3] = \"anonymous\"\n",
"titanic.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 绘图\n",
"\n",
"pandas提供了更加方便的方式用于绘制各种图形,比如我们想知道乘客年龄的分布"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic[\"Age\"].plot.hist(bins=20)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"同样,我们也可以绘制箱型图"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAh8AAAGdCAYAAACyzRGfAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAAAl/ElEQVR4nO3df3RU9Z3/8Vd+kB9NmEFSmCSSQJagQQmtIoZBsjU0a5alHrMJRdTu2oqrXw1YEpQSK1gtEMUfoJJA9aRBT6UqbGQLrqCbKg27IWq6qKy7MbCxRJMMtiUzSSQTSOb7R5dpR/HHJJPPZJLn45x7au69c/PmjzJP7tx7J8zj8XgEAABgSHiwBwAAAKML8QEAAIwiPgAAgFHEBwAAMIr4AAAARhEfAADAKOIDAAAYRXwAAACjIoM9wKf19/ertbVVY8eOVVhYWLDHAQAAX4HH41FnZ6eSk5MVHv7F5zaGXXy0trYqJSUl2GMAAIABaGlp0aRJk75wn2EXH2PHjpX0p+EtFkuQpwEAAF+Fy+VSSkqK9338iwy7+Dj7UYvFYiE+AAAIMV/lkgkuOAUAAEYRHwAAwCjiAwAAGEV8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADBq2D1kDMDI1NfXp9raWrW1tSkpKUnZ2dmKiIgI9lgAgsCvMx99fX1as2aN0tLSFBsbq6lTp+qnP/2pPB6Pdx+Px6O1a9cqKSlJsbGxys3NVVNTU8AHBxA6qqurlZ6erpycHF1//fXKyclRenq6qqurgz0agCDwKz4efPBBbd26VVu2bNF///d/68EHH9TGjRv1xBNPePfZuHGjHn/8cW3btk319fWKi4tTXl6eenp6Aj48gOGvurpaixYtUmZmpurq6tTZ2am6ujplZmZq0aJFBAgwCoV5/vK0xZf4zne+I5vNpsrKSu+6wsJCxcbG6he/+IU8Ho+Sk5O1cuVK3XnnnZIkp9Mpm82m7du3a8mSJV/6O1wul6xWq5xOJ9/tAoS4vr4+paenKzMzU7t37/b5mu3+/n7l5+fryJEjampq4iMYIMT58/7t15mPuXPnqqamRu+//74k6e2339bBgwe1YMECSVJzc7Pa29uVm5vrfY3ValVWVpbq6urOeUy32y2Xy+WzABgZamtr9cEHH+juu+/2CQ9JCg8PV2lpqZqbm1VbWxukCQEEg18XnK5evVoul0sZGRmKiIhQX1+f1q9frxtuuEGS1N7eLkmy2Ww+r7PZbN5tn1ZWVqb77rtvILMDGOba2tokSTNmzDjn9rPrz+4HYHTw68zHCy+8oGeffVY7duzQb3/7Wz399NN6+OGH9fTTTw94gNLSUjmdTu/S0tIy4GMBGF6SkpIkSUeOHDnn9rPrz+4HYHTwKz7uuusurV69WkuWLFFmZqb+4R/+QcXFxSorK5MkJSYmSpIcDofP6xwOh3fbp0VHR8tisfgsAEaG7OxsTZkyRRs2bFB/f7/Ptv7+fpWVlSktLU3Z2dlBmhBAMPgVH5988slnPreNiIjw/qWSlpamxMRE1dTUeLe7XC7V19fLbrcHYFwAoSQiIkKPPPKI9u7dq/z8fJ+7XfLz87V37149/PDDXGwKjDJ+XfNx9dVXa/369UpNTdXFF1+s//zP/9Sjjz6qm266SZIUFhamFStWaN26dZo2bZrS0tK0Zs0aJScnKz8/fyjmBzDMFRQUaNeuXVq5cqXmzp3rXZ+WlqZdu3apoKAgiNMBCAa/brXt7OzUmjVr9OKLL+rEiRNKTk7Wddddp7Vr1yoqKkrSnx4ydu+99+rJJ59UR0eH5s2bp4qKCl1wwQVf6Xdwqy0wMvGEU2Bk8+f926/4MIH4AAAg9AzZcz4AAAAGi/gAAABGER8AAMAo4gMAABhFfAAAAKOIDwAAYBTxAQAAjCI+AACAUcQHAAAwivgAAABGER8AAMAo4gMAABhFfAAAAKOIDwAAYFRksAcAMDr09fWptrZWbW1tSkpKUnZ2tiIiIoI9FoAg4MwHgCFXXV2t9PR05eTk6Prrr1dOTo7S09NVXV0d7NEABAHxAWBIVVdXa9GiRcrMzFRdXZ06OztVV1enzMxMLVq0iAABRqEwj8fjCfYQf8nlcslqtcrpdMpisQR7HACD0NfXp/T0dGVmZmr37t0KD//zv3f6+/uVn5+vI0eOqKmpiY9ggBDnz/s3Zz4ADJna2lp98MEHuvvuu33CQ5LCw8NVWlqq5uZm1dbWBmlCAMFAfAAYMm1tbZKkGTNmnHP72fVn9wMwOhAfAIZMUlKSJOnIkSPn3H52/dn9AIwOxAeAIZOdna0pU6Zow4YN6u/v99nW39+vsrIypaWlKTs7O0gTAggG4gPAkImIiNAjjzyivXv3Kj8/3+dul/z8fO3du1cPP/wwF5sCowwPGQMwpAoKCrRr1y6tXLlSc+fO9a5PS0vTrl27VFBQEMTpAAQDt9oCMIInnAIjmz/v35z5AGBERESErrzyymCPAWAYID4AGNHb26uKigodO3ZMU6dO1e23366oqKhgjwUgCIgPAENu1apV2rRpk86cOeNdd9ddd6m4uFgbN24M4mQAgoG7XQAMqVWrVumhhx5SQkKCnnrqKbW1tempp55SQkKCHnroIa1atSrYIwIwjAtOAQyZ3t5excXFKSEhQb/73e9UV1fnveDUbrdr8uTJ+sMf/qDu7m4+ggFCHN/tAmBYqKio0JkzZ1RQUKCMjAzl5OTo+uuvV05OjjIyMvT3f//3OnPmjCoqKoI9KgCDuOYDwJA5duyYJGnr1q1auHChrrnmGp06dUqxsbE6evSotm3b5rMfgNHBrzMfU6ZMUVhY2GeWoqIiSVJPT4+KioqUkJCg+Ph4FRYWyuFwDMngAIa/KVOmSJImTJig/fv367HHHtOTTz6pxx57TPv379eECRN89gMwOvgVH2+++aba2tq8y6uvvipJ+u53vytJKi4u1p49e7Rz504dOHBAra2tPL0QGMUyMzMlSR9//PE5Lzj9+OOPffYDMDr49bHL2X+lnPXAAw9o6tSp+ta3viWn06nKykrt2LFD8+fPlyRVVVVp+vTpOnTokObMmRO4qQGEhL8889nf3/+Z5Vz7ARj5BnzBaW9vr37xi1/opptuUlhYmBoaGnT69Gnl5uZ698nIyFBqaqrq6uo+9zhut1sul8tnATAy1NfXS5KysrJ08uRJ3XrrrTr//PN166236uTJk5o9e7bPfgBGhwHHx+7du9XR0aHvf//7kqT29nZFRUVp3LhxPvvZbDa1t7d/7nHKyspktVq9S0pKykBHAjDMnL2T32KxqLOzU5s2bdKyZcu0adMmdXZ2ev++GGZ3/AMYYgOOj8rKSi1YsEDJycmDGqC0tFROp9O7tLS0DOp4AIaPadOmSZJeffVVLV68WFlZWdqwYYOysrK0ePFi73VjZ/cDMDoM6Fbb3/3ud/q3f/s3VVdXe9clJiaqt7dXHR0dPmc/HA6HEhMTP/dY0dHRio6OHsgYAIa522+/XXfddZfi4uL09ttva+7cud5tkydPltVqVXd3t26//fYgTgnAtAGd+aiqqtLEiRO1cOFC77pZs2ZpzJgxqqmp8a5rbGzU8ePHZbfbBz8pgJATFRWl4uJiOZ1Oud1ulZSUaMuWLSopKVFPT4+cTqeKi4t5uikwyvh95qO/v19VVVW68cYbFRn555dbrVYtXbpUJSUlGj9+vCwWi5YvXy673c6dLsAodvaL4zZt2qRHH33Uuz4yMlJ33XUXXywHjEJ+f7fLK6+8ory8PDU2NuqCCy7w2dbT06OVK1fql7/8pdxut/Ly8lRRUfGFH7t8Gt/tAoxMvb29qqio0LFjxzR16lTdfvvtnPEARhB/3r/5YjkAADBofLEcAAAYtogPAABgFPEBAACMIj4AAIBRxAcAADBqQE84BQB/9fX1qba2Vm1tbUpKSlJ2drYiIiKCPRaAIODMB4AhV11drfT0dOXk5Oj6669XTk6O0tPTfb6iAcDoQXwAGFLV1dVatGiRMjMzVVdXp87OTtXV1SkzM1OLFi0iQIBRiIeMARgyfX19Sk9PV2Zmpnbv3q3w8D//e6e/v1/5+fk6cuSImpqa+AgGCHE8ZAzAsFBbW6sPPvhAd999t9xut5YtW6a8vDwtW7ZMbrdbpaWlam5uVm1tbbBHBWAQF5wCGDJtbW2SpHXr1umll17yrn/llVdUXl7u/Wbss/sBGB048wFgyCQlJUmSXnrpJUVFRWn16tU6evSoVq9eraioKG+QnN0PwOjANR8AhkxXV5fGjh2rsLAwffLJJ4qJifFu6+np0de+9jV5PB51dnYqPj4+iJMCGCyu+QAwLKxevVqS5PF4tHjxYp+7XRYvXqyz//Y5ux+A0YH4ADBkmpqaJElbtmzRu+++q7lz58pisWju3Lk6cuSInnjiCZ/9AIwOxAeAITNt2jRJ0ocffqijR4/qtdde044dO/Taa6+pqalJLS0tPvsBGB245gPAkDl16pS+9rWvKSoqSp2dnYqKivJu6+3t1dixY9Xb26tPPvlEsbGxQZwUwGD58/7NrbYAvtSp3j4d+7hrQK/NuWqBXnvlZcWPHaslN/0/ZeUVqn7/P+u5n2/T6d5e5Vy1QMf+2Cupd0DHnzohXrFRPKAMCCWc+QDwpY585NR3njg44Nef+Oef6tTR+s+sj03P0sTCNYMZTXuXz9OM862DOgaAwfPn/Zv4APClBnPmw3uMU6d0749X67U33lHO5TN13/oHAvJRC2c+gOGB+AAwLJ09g8LZCmDk4TkfAABg2CI+AACAUcQHAAAwivgAAABGER8AAMAo4gMAABhFfAAAAKOIDwAAYBTxAQAAjCI+AACAUX7Hx0cffaTvfe97SkhIUGxsrDIzM/XWW295t3s8Hq1du1ZJSUmKjY1Vbm6umpqaAjo0AAAIXX7Fx8mTJ3XFFVdozJgxevnll/Xee+/pkUce0XnnnefdZ+PGjXr88ce1bds21dfXKy4uTnl5eerp6Qn48AAAIPRE+rPzgw8+qJSUFFVVVXnXpaWlef/b4/Fo8+bNuueee3TNNddIkp555hnZbDbt3r1bS5YsCdDYAAAgVPl15uNXv/qVLrvsMn33u9/VxIkTdckll+ipp57ybm9ublZ7e7tyc3O966xWq7KyslRXV3fOY7rdbrlcLp8FAACMXH7Fx//+7/9q69atmjZtmvbv36/bbrtNd9xxh55++mlJUnt7uyTJZrP5vM5ms3m3fVpZWZmsVqt3SUlJGcifAwAAhAi/4qO/v1+XXnqpNmzYoEsuuUS33HKL/umf/knbtm0b8AClpaVyOp3epaWlZcDHAgAAw59f8ZGUlKSLLrrIZ9306dN1/PhxSVJiYqIkyeFw+OzjcDi82z4tOjpaFovFZwEAACOXX/FxxRVXqLGx0Wfd+++/r8mTJ0v608WniYmJqqmp8W53uVyqr6+X3W4PwLgAACDU+XW3S3FxsebOnasNGzZo8eLFeuONN/Tkk0/qySeflCSFhYVpxYoVWrdunaZNm6a0tDStWbNGycnJys/PH4r5AQBAiPErPmbPnq0XX3xRpaWluv/++5WWlqbNmzfrhhtu8O6zatUqdXd365ZbblFHR4fmzZunffv2KSYmJuDDAwCA0BPm8Xg8wR7iL7lcLlmtVjmdTq7/AEaYIx859Z0nDmrv8nmacb412OMACCB/3r/5bhcAAGAU8QEAAIwiPgAAgFHEBwAAMIr4AAAARhEfAADAKOIDAAAYRXwAAACjiA8AAGAU8QEAAIwiPgAAgFHEBwAAMIr4AAAARhEfAADAKOIDAAAYRXwAAACjiA8AAGAU8QEAAIwiPgAAgFHEBwAAMIr4AAAARhEfAADAKOIDAAAYRXwAAACjiA8AAGAU8QEAAIwiPgAAgFHEBwAAMIr4AAAARhEfAADAKOIDAAAY5Vd8/OQnP1FYWJjPkpGR4d3e09OjoqIiJSQkKD4+XoWFhXI4HAEfGgAAhC6/z3xcfPHFamtr8y4HDx70bisuLtaePXu0c+dOHThwQK2trSooKAjowAAAILRF+v2CyEglJiZ+Zr3T6VRlZaV27Nih+fPnS5Kqqqo0ffp0HTp0SHPmzBn8tAAAIOT5feajqalJycnJ+qu/+ivdcMMNOn78uCSpoaFBp0+fVm5urnffjIwMpaamqq6u7nOP53a75XK5fBYAADBy+RUfWVlZ2r59u/bt26etW7equblZ2dnZ6uzsVHt7u6KiojRu3Dif19hsNrW3t3/uMcvKymS1Wr1LSkrKgP4gAAAgNPj1scuCBQu8/z1z5kxlZWVp8uTJeuGFFxQbGzugAUpLS1VSUuL92eVyESAAAIxgg7rVdty4cbrgggt09OhRJSYmqre3Vx0dHT77OByOc14jclZ0dLQsFovPAgAARq5BxUdXV5eOHTumpKQkzZo1S2PGjFFNTY13e2Njo44fPy673T7oQQEAwMjg18cud955p66++mpNnjxZra2tuvfeexUREaHrrrtOVqtVS5cuVUlJicaPHy+LxaLly5fLbrdzpwsAAPDyKz4+/PBDXXfddfrDH/6gCRMmaN68eTp06JAmTJggSdq0aZPCw8NVWFgot9utvLw8VVRUDMngAAAgNPkVH88999wXbo+JiVF5ebnKy8sHNRQAABi5+G4XAABgFPEBAACMIj4AAIBRxAcAADCK+AAAAEYRHwAAwCjiAwAAGEV8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADCK+AAAAEYRHwAAwCjiAwAAGEV8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADCK+AAAAEYRHwAAwCjiAwAAGEV8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADCK+AAAAEYRHwAAwCjiAwAAGDWo+HjggQcUFhamFStWeNf19PSoqKhICQkJio+PV2FhoRwOx2DnBAAAI8SA4+PNN9/Uz372M82cOdNnfXFxsfbs2aOdO3fqwIEDam1tVUFBwaAHBQAAI8OA4qOrq0s33HCDnnrqKZ133nne9U6nU5WVlXr00Uc1f/58zZo1S1VVVfqP//gPHTp0KGBDAwCA0DWg+CgqKtLChQuVm5vrs76hoUGnT5/2WZ+RkaHU1FTV1dWd81hut1sul8tnAQAAI1ekvy947rnn9Nvf/lZvvvnmZ7a1t7crKipK48aN81lvs9nU3t5+zuOVlZXpvvvu83cMAAAQovw689HS0qIf/vCHevbZZxUTExOQAUpLS+V0Or1LS0tLQI4LAACGJ7/io6GhQSdOnNCll16qyMhIRUZG6sCBA3r88ccVGRkpm82m3t5edXR0+LzO4XAoMTHxnMeMjo6WxWLxWQAAwMjl18cu3/72t/Xuu+/6rPvBD36gjIwM/ehHP1JKSorGjBmjmpoaFRYWSpIaGxt1/Phx2e32wE0NAABCll/xMXbsWM2YMcNnXVxcnBISErzrly5dqpKSEo0fP14Wi0XLly+X3W7XnDlzAjc1AAAIWX5fcPplNm3apPDwcBUWFsrtdisvL08VFRWB/jUAACBEDTo+Xn/9dZ+fY2JiVF5ervLy8sEeGgAAjEB8twsAADAq4B+7ABg+mn/frW73mWCP4XX0RJfP/w4ncdGRSvt6XLDHAEYF4gMYoZp/362ch18P9hjntOL5w8Ee4Zxeu/NKAgQwgPgARqizZzw2X/tNpU+MD/I0f9Jzuk8fnjylSefFKmZMRLDH8Tp6oksrnj88rM4SASMZ8QGMcOkT4zXjfGuwx/C6bEqwJwAQbFxwCgAAjCI+AACAUcQHAAAwivgAAABGER8AAMAo4gMAABhFfAAAAKOIDwAAYBTxAQAAjCI+AACAUcQHAAAwivgAAABGER8AAMAo4gMAABhFfAAAAKOIDwAAYBTxAQAAjCI+AACAUcQHAAAwivgAAABGER8AAMAo4gMAABhFfAAAAKOIDwAAYBTxAQAAjCI+AACAUX7Fx9atWzVz5kxZLBZZLBbZ7Xa9/PLL3u09PT0qKipSQkKC4uPjVVhYKIfDEfChAQBA6PIrPiZNmqQHHnhADQ0NeuuttzR//nxdc801+q//+i9JUnFxsfbs2aOdO3fqwIEDam1tVUFBwZAMDgAAQlOkPztfffXVPj+vX79eW7du1aFDhzRp0iRVVlZqx44dmj9/viSpqqpK06dP16FDhzRnzpzATQ0AAELWgK/56Ovr03PPPafu7m7Z7XY1NDTo9OnTys3N9e6TkZGh1NRU1dXVBWRYAAAQ+vw68yFJ7777rux2u3p6ehQfH68XX3xRF110kQ4fPqyoqCiNGzfOZ3+bzab29vbPPZ7b7Zbb7fb+7HK5/B0JAACEEL/PfFx44YU6fPiw6uvrddttt+nGG2/Ue++9N+ABysrKZLVavUtKSsqAjwUAAIY/v+MjKipK6enpmjVrlsrKyvSNb3xDjz32mBITE9Xb26uOjg6f/R0OhxITEz/3eKWlpXI6nd6lpaXF7z8EAAAIHYN+zkd/f7/cbrdmzZqlMWPGqKamxrutsbFRx48fl91u/9zXR0dHe2/dPbsAAICRy69rPkpLS7VgwQKlpqaqs7NTO3bs0Ouvv679+/fLarVq6dKlKikp0fjx42WxWLR8+XLZ7XbudAEAAF5+xceJEyf0j//4j2pra5PVatXMmTO1f/9+/c3f/I0kadOmTQoPD1dhYaHcbrfy8vJUUVExJIMDAIDQ5Fd8VFZWfuH2mJgYlZeXq7y8fFBDAQCAkYvvdgEAAEb5/ZwPAKEjLNKlZlejwmPigz3KsNbs6lJYJM8YAkwhPoARbMy4et39xoZgjxESxoz7tqS/C/YYwKhAfAAj2OmOLD2y8HpNnciZjy9y7ESX7nj2WLDHAEYN4gMYwTxnLEqzXKiLEqzBHmVY6+9xynPm42CPAYwaXHAKAACMIj4AAIBRxAcAADCK+AAAAEYRHwAAwCjiAwAAGEV8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADCK+AAAAEYRHwAAwCjiAwAAGEV8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADCK+AAAAEYRHwAAwCjiAwAAGEV8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADDKr/goKyvT7NmzNXbsWE2cOFH5+flqbGz02aenp0dFRUVKSEhQfHy8CgsL5XA4Ajo0AAAIXX7Fx4EDB1RUVKRDhw7p1Vdf1enTp3XVVVepu7vbu09xcbH27NmjnTt36sCBA2ptbVVBQUHABwcAAKEp0p+d9+3b5/Pz9u3bNXHiRDU0NOiv//qv5XQ6VVlZqR07dmj+/PmSpKqqKk2fPl2HDh3SnDlzAjc5gC906nSfJOnIR84gT/JnPaf79OHJU5p0XqxixkQEexyvoye6gj0CMKr4FR+f5nT+6S+18ePHS5IaGhp0+vRp5ebmevfJyMhQamqq6urqzhkfbrdbbrfb+7PL5RrMSAD+z7H/e0NdXf1ukCcJHXHRg/orEcBXNOD/p/X392vFihW64oorNGPGDElSe3u7oqKiNG7cOJ99bTab2tvbz3mcsrIy3XfffQMdA8DnuOriREnS1Inxih0mZxmOnujSiucPa/O131T6xPhgj+MjLjpSaV+PC/YYwKgw4PgoKirSkSNHdPDgwUENUFpaqpKSEu/PLpdLKSkpgzomAGl8XJSWXJ4a7DHOKX1ivGacbw32GACCZEDxsWzZMu3du1e/+c1vNGnSJO/6xMRE9fb2qqOjw+fsh8PhUGJi4jmPFR0drejo6IGMAQAAQpBfd7t4PB4tW7ZML774on79618rLS3NZ/usWbM0ZswY1dTUeNc1Njbq+PHjstvtgZkYAACENL/OfBQVFWnHjh36l3/5F40dO9Z7HYfValVsbKysVquWLl2qkpISjR8/XhaLRcuXL5fdbudOFwAAIMnP+Ni6dask6corr/RZX1VVpe9///uSpE2bNik8PFyFhYVyu93Ky8tTRUVFQIYFAAChz6/48Hg8X7pPTEyMysvLVV5ePuChAADAyMV3uwAAAKOIDwAAYBTxAQAAjCI+AACAUcQHAAAwivgAAABGER8AAMAo4gMAABhFfAAAAKOIDwAAYBTxAQAAjCI+AACAUcQHAAAwivgAAABGER8AAMAo4gMAABhFfAAAAKOIDwAAYBTxAQAAjCI+AACAUcQHAAAwivgAAABGER8AAMAo4gMAABhFfAAAAKOIDwAAYBTxAQAAjCI+AACAUcQHAAAwivgAAABGER8AAMAov+PjN7/5ja6++molJycrLCxMu3fv9tnu8Xi0du1aJSUlKTY2Vrm5uWpqagrUvAAAIMT5HR/d3d36xje+ofLy8nNu37hxox5//HFt27ZN9fX1iouLU15ennp6egY9LAAACH2R/r5gwYIFWrBgwTm3eTwebd68Wffcc4+uueYaSdIzzzwjm82m3bt3a8mSJYObFgAAhLyAXvPR3Nys9vZ25ebmetdZrVZlZWWprq7unK9xu91yuVw+CwAAGLkCGh/t7e2SJJvN5rPeZrN5t31aWVmZrFard0lJSQnkSAAAYJgJ+t0upaWlcjqd3qWlpSXYIwEAgCEU0PhITEyUJDkcDp/1DofDu+3ToqOjZbFYfBYAADByBTQ+0tLSlJiYqJqaGu86l8ul+vp62e32QP4qAAAQovy+26Wrq0tHjx71/tzc3KzDhw9r/PjxSk1N1YoVK7Ru3TpNmzZNaWlpWrNmjZKTk5Wfnx/IuQEAQIjyOz7eeust5eTkeH8uKSmRJN14443avn27Vq1ape7ubt1yyy3q6OjQvHnztG/fPsXExARuagAAELL8jo8rr7xSHo/nc7eHhYXp/vvv1/333z+owQAAwMgU9LtdAADA6EJ8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADCK+AAAAEYRHwAAwCjiAwAAGEV8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADCK+AAAAEYRHwAAwCjiAwAAGEV8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADCK+AAAAEYRHwAAwCjiAwAAGEV8AAAAo4gPAABgFPEBAACMIj4AAIBRxAcAADCK+AAAAEYNWXyUl5drypQpiomJUVZWlt54442h+lUAACCERA7FQZ9//nmVlJRo27ZtysrK0ubNm5WXl6fGxkZNnDhxKH4lgCF0qrdPxz7uGvRxjp7o8vnfQJg6IV6xUREBOx6AoRfm8Xg8gT5oVlaWZs+erS1btkiS+vv7lZKSouXLl2v16tVf+FqXyyWr1Sqn0ymLxRLo0QAMwJGPnPrOEweDPcY57V0+TzPOtwZ7DGDU8+f9O+BnPnp7e9XQ0KDS0lLvuvDwcOXm5qquru4z+7vdbrndbu/PLpcr0CMBGKSpE+K1d/m8QR+n53SfPjx5SpPOi1XMmMCcrZg6IT4gxwFgTsDj4/e//736+vpks9l81ttsNv3P//zPZ/YvKyvTfffdF+gxAARQbFREwM4uXDYlIIcBEMKCfrdLaWmpnE6nd2lpaQn2SAAAYAgF/MzH17/+dUVERMjhcPisdzgcSkxM/Mz+0dHRio6ODvQYAABgmAr4mY+oqCjNmjVLNTU13nX9/f2qqamR3W4P9K8DAAAhZkhutS0pKdGNN96oyy67TJdffrk2b96s7u5u/eAHPxiKXwcAAELIkMTHtddeq48//lhr165Ve3u7vvnNb2rfvn2fuQgVAACMPkPynI/B4DkfAACEHn/ev4N+twsAABhdiA8AAGAU8QEAAIwiPgAAgFHEBwAAMIr4AAAARhEfAADAqCF5yNhgnH3siMvlCvIkAADgqzr7vv1VHh827OKjs7NTkpSSkhLkSQAAgL86OztltVq/cJ9h94TT/v5+tba2auzYsQoLCwv2OAACyOVyKSUlRS0tLTzBGBhhPB6POjs7lZycrPDwL76qY9jFB4CRi69PACBxwSkAADCM+AAAAEYRHwCMiY6O1r333qvo6OhgjwIgiLjmAwAAGMWZDwAAYBTxAQAAjCI+AACAUcQHAAAwivgAEBB1dXWKiIjQwoULgz0KgGGOu10ABMTNN9+s+Ph4VVZWqrGxUcnJycEeCcAwxZkPAIPW1dWl559/XrfddpsWLlyo7du3+2z/1a9+pWnTpikmJkY5OTl6+umnFRYWpo6ODu8+Bw8eVHZ2tmJjY5WSkqI77rhD3d3dZv8gAIwgPgAM2gsvvKCMjAxdeOGF+t73vqef//zn3q/Vbm5u1qJFi5Sfn6+3335bt956q3784x/7vP7YsWP627/9WxUWFuqdd97R888/r4MHD2rZsmXB+OMAGGJ87AJg0K644gotXrxYP/zhD3XmzBklJSVp586duvLKK7V69Wq99NJLevfdd73733PPPVq/fr1OnjypcePG6eabb1ZERIR+9rOfefc5ePCgvvWtb6m7u1sxMTHB+GMBGCKc+QAwKI2NjXrjjTd03XXXSZIiIyN17bXXqrKy0rt99uzZPq+5/PLLfX5+++23tX37dsXHx3uXvLw89ff3q7m52cwfBIAxkcEeAEBoq6ys1JkzZ3wuMPV4PIqOjtaWLVu+0jG6urp066236o477vjMttTU1IDNCmB4ID4ADNiZM2f0zDPP6JFHHtFVV13lsy0/P1+//OUvdeGFF+pf//Vffba9+eabPj9feumleu+995Senj7kMwMIPq75ADBgu3fv1rXXXqsTJ07IarX6bPvRj36kX//613rhhRd04YUXqri4WEuXLtXhw4e1cuVKffjhh+ro6JDVatU777yjOXPm6KabbtLNN9+suLg4vffee3r11Ve/8tkTAKGDaz4ADFhlZaVyc3M/Ex6SVFhYqLfeekudnZ3atWuXqqurNXPmTG3dutV7t0t0dLQkaebMmTpw4IDef/99ZWdn65JLLtHatWt5VggwQnHmA4Bx69ev17Zt29TS0hLsUQAEAdd8ABhyFRUVmj17thISEvTv//7veuihh3iGBzCKER8AhlxTU5PWrVunP/7xj0pNTdXKlStVWloa7LEABAkfuwAAAKO44BQAABhFfAAAAKOIDwAAYBTxAQAAjCI+AACAUcQHAAAwivgAAABGER8AAMAo4gMAABj1/wE4/XiNqtoqiwAAAABJRU5ErkJggg==",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic[\"Age\"].plot.box()\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 添加新列\n",
"我们知道,泰坦尼克号失事于1912年,我们可以通过这一信息倒推出乘客的出生年"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
" YOB | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" Braund, Mr. Owen Harris | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
" 1890.0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
" 1874.0 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" Heikkinen, Miss. Laina | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
" 1886.0 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
" 1877.0 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
" 1877.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked YOB \n",
"0 0 A/5 21171 7.2500 NaN S 1890.0 \n",
"1 0 PC 17599 71.2833 C85 C 1874.0 \n",
"2 0 STON/O2. 3101282 7.9250 NaN S 1886.0 \n",
"3 0 113803 53.1000 C123 S 1877.0 \n",
"4 0 373450 8.0500 NaN S 1877.0 "
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic[\"YOB\"] = 1912 - titanic[\"Age\"]\n",
"titanic.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 统计量计算\n",
"前面已经介绍过了,如何计算统计量,现在展示如何分组计算统计量,比如我想计算男女乘客分别的平均年龄"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Age | \n",
"
\n",
" \n",
" Sex | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" female | \n",
" 27.915709 | \n",
"
\n",
" \n",
" male | \n",
" 30.726645 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Age\n",
"Sex \n",
"female 27.915709\n",
"male 30.726645"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic[[\"Sex\", \"Age\"]].groupby(\"Sex\").mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"同样的,如果我想将`Pclass`也加入分类中。"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Sex Pclass\n",
"female 1 34.611765\n",
" 2 28.722973\n",
" 3 21.750000\n",
"male 1 41.281386\n",
" 2 30.740707\n",
" 3 26.507589\n",
"Name: Age, dtype: float64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic.groupby([\"Sex\", \"Pclass\"])[\"Age\"].mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果我想计算,乘坐3种舱位的乘客数量,下面两种方式将得到相同的结果"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Pclass\n",
"3 491\n",
"1 216\n",
"2 184\n",
"Name: count, dtype: int64"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic[\"Pclass\"].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"Pclass\n",
"1 216\n",
"2 184\n",
"3 491\n",
"Name: Pclass, dtype: int64"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic.groupby(\"Pclass\")[\"Pclass\"].count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 改变表格布局\n",
"\n",
"有些时候,我们需要按照指定的顺序排列我们的表格,以泰坦尼克号数据集为例"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 803 | \n",
" 804 | \n",
" 1 | \n",
" 3 | \n",
" Thomas, Master. Assad Alexander | \n",
" male | \n",
" 0.42 | \n",
" 0 | \n",
" 1 | \n",
" 2625 | \n",
" 8.5167 | \n",
" NaN | \n",
" C | \n",
"
\n",
" \n",
" 755 | \n",
" 756 | \n",
" 1 | \n",
" 2 | \n",
" Hamalainen, Master. Viljo | \n",
" male | \n",
" 0.67 | \n",
" 1 | \n",
" 1 | \n",
" 250649 | \n",
" 14.5000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 644 | \n",
" 645 | \n",
" 1 | \n",
" 3 | \n",
" Baclini, Miss. Eugenie | \n",
" female | \n",
" 0.75 | \n",
" 2 | \n",
" 1 | \n",
" 2666 | \n",
" 19.2583 | \n",
" NaN | \n",
" C | \n",
"
\n",
" \n",
" 469 | \n",
" 470 | \n",
" 1 | \n",
" 3 | \n",
" Baclini, Miss. Helene Barbara | \n",
" female | \n",
" 0.75 | \n",
" 2 | \n",
" 1 | \n",
" 2666 | \n",
" 19.2583 | \n",
" NaN | \n",
" C | \n",
"
\n",
" \n",
" 78 | \n",
" 79 | \n",
" 1 | \n",
" 2 | \n",
" Caldwell, Master. Alden Gates | \n",
" male | \n",
" 0.83 | \n",
" 0 | \n",
" 2 | \n",
" 248738 | \n",
" 29.0000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass Name Sex \\\n",
"803 804 1 3 Thomas, Master. Assad Alexander male \n",
"755 756 1 2 Hamalainen, Master. Viljo male \n",
"644 645 1 3 Baclini, Miss. Eugenie female \n",
"469 470 1 3 Baclini, Miss. Helene Barbara female \n",
"78 79 1 2 Caldwell, Master. Alden Gates male \n",
"\n",
" Age SibSp Parch Ticket Fare Cabin Embarked \n",
"803 0.42 0 1 2625 8.5167 NaN C \n",
"755 0.67 1 1 250649 14.5000 NaN S \n",
"644 0.75 2 1 2666 19.2583 NaN C \n",
"469 0.75 2 1 2666 19.2583 NaN C \n",
"78 0.83 0 2 248738 29.0000 NaN S "
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"titanic.sort_values(by=\"Age\").head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"有些时候,我们需要将一张“长表格”转化为一张“宽表格”。我们观察[air_quality_long.csv](air_quality_long.csv)中,二氧化氮的部分,并且观察其中的一小部分"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" city | \n",
" country | \n",
" location | \n",
" parameter | \n",
" value | \n",
" unit | \n",
"
\n",
" \n",
" date.utc | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2019-04-09 01:00:00+00:00 | \n",
" Antwerpen | \n",
" BE | \n",
" BETR801 | \n",
" no2 | \n",
" 22.5 | \n",
" µg/m³ | \n",
"
\n",
" \n",
" 2019-04-09 01:00:00+00:00 | \n",
" Paris | \n",
" FR | \n",
" FR04014 | \n",
" no2 | \n",
" 24.4 | \n",
" µg/m³ | \n",
"
\n",
" \n",
" 2019-04-09 02:00:00+00:00 | \n",
" London | \n",
" GB | \n",
" London Westminster | \n",
" no2 | \n",
" 67.0 | \n",
" µg/m³ | \n",
"
\n",
" \n",
" 2019-04-09 02:00:00+00:00 | \n",
" Antwerpen | \n",
" BE | \n",
" BETR801 | \n",
" no2 | \n",
" 53.5 | \n",
" µg/m³ | \n",
"
\n",
" \n",
" 2019-04-09 02:00:00+00:00 | \n",
" Paris | \n",
" FR | \n",
" FR04014 | \n",
" no2 | \n",
" 27.4 | \n",
" µg/m³ | \n",
"
\n",
" \n",
" 2019-04-09 03:00:00+00:00 | \n",
" London | \n",
" GB | \n",
" London Westminster | \n",
" no2 | \n",
" 67.0 | \n",
" µg/m³ | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" city country location parameter \\\n",
"date.utc \n",
"2019-04-09 01:00:00+00:00 Antwerpen BE BETR801 no2 \n",
"2019-04-09 01:00:00+00:00 Paris FR FR04014 no2 \n",
"2019-04-09 02:00:00+00:00 London GB London Westminster no2 \n",
"2019-04-09 02:00:00+00:00 Antwerpen BE BETR801 no2 \n",
"2019-04-09 02:00:00+00:00 Paris FR FR04014 no2 \n",
"2019-04-09 03:00:00+00:00 London GB London Westminster no2 \n",
"\n",
" value unit \n",
"date.utc \n",
"2019-04-09 01:00:00+00:00 22.5 µg/m³ \n",
"2019-04-09 01:00:00+00:00 24.4 µg/m³ \n",
"2019-04-09 02:00:00+00:00 67.0 µg/m³ \n",
"2019-04-09 02:00:00+00:00 53.5 µg/m³ \n",
"2019-04-09 02:00:00+00:00 27.4 µg/m³ \n",
"2019-04-09 03:00:00+00:00 67.0 µg/m³ "
]
},
"execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"air_quality = pd.read_csv(\"air_quality_long.csv\", index_col=\"date.utc\", parse_dates=True)\n",
"no2 = air_quality[air_quality[\"parameter\"] == \"no2\"]\n",
"no2_subset = no2.sort_index().groupby([\"location\"]).head(2)\n",
"no2_subset"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这种表格往往被我们称之为长表格,如果我们想要将其中的3个站点做为单独的列,也就是变为宽表格,在pandas中很容易做到这一点。"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" location | \n",
" BETR801 | \n",
" FR04014 | \n",
" London Westminster | \n",
"
\n",
" \n",
" date.utc | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2019-04-09 01:00:00+00:00 | \n",
" 22.5 | \n",
" 24.4 | \n",
" NaN | \n",
"
\n",
" \n",
" 2019-04-09 02:00:00+00:00 | \n",
" 53.5 | \n",
" 27.4 | \n",
" 67.0 | \n",
"
\n",
" \n",
" 2019-04-09 03:00:00+00:00 | \n",
" NaN | \n",
" NaN | \n",
" 67.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
"location BETR801 FR04014 London Westminster\n",
"date.utc \n",
"2019-04-09 01:00:00+00:00 22.5 24.4 NaN\n",
"2019-04-09 02:00:00+00:00 53.5 27.4 67.0\n",
"2019-04-09 03:00:00+00:00 NaN NaN 67.0"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"air_quality = pd.read_csv(\"air_quality_long.csv\", index_col=\"date.utc\", parse_dates=True)\n",
"no2 = air_quality[air_quality[\"parameter\"] == \"no2\"]\n",
"no2_subset = no2.sort_index().groupby([\"location\"]).head(2)\n",
"no2_subset.pivot(columns=\"location\", values=\"value\")"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
""
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import pandas as pd\n",
"air_quality = pd.read_csv(\"air_quality_long.csv\", index_col=\"date.utc\", parse_dates=True)\n",
"no2 = air_quality[air_quality[\"parameter\"] == \"no2\"]\n",
"no2.pivot(columns=\"location\", values=\"value\").plot()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们还可以将宽表格重新转化为长表格"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" date.utc | \n",
" location | \n",
" value | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 2019-04-09 01:00:00+00:00 | \n",
" BETR801 | \n",
" 22.5 | \n",
"
\n",
" \n",
" 1 | \n",
" 2019-04-09 02:00:00+00:00 | \n",
" BETR801 | \n",
" 53.5 | \n",
"
\n",
" \n",
" 2 | \n",
" 2019-04-09 03:00:00+00:00 | \n",
" BETR801 | \n",
" NaN | \n",
"
\n",
" \n",
" 3 | \n",
" 2019-04-09 01:00:00+00:00 | \n",
" FR04014 | \n",
" 24.4 | \n",
"
\n",
" \n",
" 4 | \n",
" 2019-04-09 02:00:00+00:00 | \n",
" FR04014 | \n",
" 27.4 | \n",
"
\n",
" \n",
" 5 | \n",
" 2019-04-09 03:00:00+00:00 | \n",
" FR04014 | \n",
" NaN | \n",
"
\n",
" \n",
" 6 | \n",
" 2019-04-09 01:00:00+00:00 | \n",
" London Westminster | \n",
" NaN | \n",
"
\n",
" \n",
" 7 | \n",
" 2019-04-09 02:00:00+00:00 | \n",
" London Westminster | \n",
" 67.0 | \n",
"
\n",
" \n",
" 8 | \n",
" 2019-04-09 03:00:00+00:00 | \n",
" London Westminster | \n",
" 67.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" date.utc location value\n",
"0 2019-04-09 01:00:00+00:00 BETR801 22.5\n",
"1 2019-04-09 02:00:00+00:00 BETR801 53.5\n",
"2 2019-04-09 03:00:00+00:00 BETR801 NaN\n",
"3 2019-04-09 01:00:00+00:00 FR04014 24.4\n",
"4 2019-04-09 02:00:00+00:00 FR04014 27.4\n",
"5 2019-04-09 03:00:00+00:00 FR04014 NaN\n",
"6 2019-04-09 01:00:00+00:00 London Westminster NaN\n",
"7 2019-04-09 02:00:00+00:00 London Westminster 67.0\n",
"8 2019-04-09 03:00:00+00:00 London Westminster 67.0"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"air_quality = pd.read_csv(\"air_quality_long.csv\", index_col=\"date.utc\", parse_dates=True)\n",
"no2 = air_quality[air_quality[\"parameter\"] == \"no2\"]\n",
"no2_subset = no2.sort_index().groupby([\"location\"]).head(2)\n",
"no2_pivot = no2_subset.pivot(columns=\"location\", values=\"value\").reset_index()\n",
"no2_pivot.melt(id_vars=\"date.utc\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 合并表格\n",
"### 纵向连接\n",
""
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"pm25的尺寸: (1825, 6)\n",
"no2的尺寸: (3447, 6)\n",
"合并后的尺寸: (5272, 6)\n"
]
},
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" city | \n",
" country | \n",
" location | \n",
" parameter | \n",
" value | \n",
" unit | \n",
"
\n",
" \n",
" date.utc | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
" | \n",
"
\n",
" \n",
" \n",
" \n",
" 2019-06-18 06:00:00+00:00 | \n",
" Antwerpen | \n",
" BE | \n",
" BETR801 | \n",
" pm25 | \n",
" 18.0 | \n",
" µg/m³ | \n",
"
\n",
" \n",
" 2019-06-17 08:00:00+00:00 | \n",
" Antwerpen | \n",
" BE | \n",
" BETR801 | \n",
" pm25 | \n",
" 6.5 | \n",
" µg/m³ | \n",
"
\n",
" \n",
" 2019-06-17 07:00:00+00:00 | \n",
" Antwerpen | \n",
" BE | \n",
" BETR801 | \n",
" pm25 | \n",
" 18.5 | \n",
" µg/m³ | \n",
"
\n",
" \n",
" 2019-06-17 06:00:00+00:00 | \n",
" Antwerpen | \n",
" BE | \n",
" BETR801 | \n",
" pm25 | \n",
" 16.0 | \n",
" µg/m³ | \n",
"
\n",
" \n",
" 2019-06-17 05:00:00+00:00 | \n",
" Antwerpen | \n",
" BE | \n",
" BETR801 | \n",
" pm25 | \n",
" 7.5 | \n",
" µg/m³ | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" city country location parameter value unit\n",
"date.utc \n",
"2019-06-18 06:00:00+00:00 Antwerpen BE BETR801 pm25 18.0 µg/m³\n",
"2019-06-17 08:00:00+00:00 Antwerpen BE BETR801 pm25 6.5 µg/m³\n",
"2019-06-17 07:00:00+00:00 Antwerpen BE BETR801 pm25 18.5 µg/m³\n",
"2019-06-17 06:00:00+00:00 Antwerpen BE BETR801 pm25 16.0 µg/m³\n",
"2019-06-17 05:00:00+00:00 Antwerpen BE BETR801 pm25 7.5 µg/m³"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"air_quality = pd.read_csv(\"air_quality_long.csv\", index_col=\"date.utc\", parse_dates=True)\n",
"#将其拆分为两个表格\n",
"air_quality_pm25 = air_quality[air_quality[\"parameter\"] == \"pm25\"]\n",
"air_quality_no2 = air_quality[air_quality[\"parameter\"] == \"no2\"]\n",
"#我们观察其尺寸\n",
"print(\"pm25的尺寸:\", air_quality_pm25.shape)\n",
"print(\"no2的尺寸:\", air_quality_no2.shape)\n",
"#合并两个表格\n",
"air_quality2 = pd.concat([air_quality_pm25, air_quality_no2], axis=0)\n",
"print(\"合并后的尺寸:\", air_quality2.shape)\n",
"air_quality2.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 横向连接\n",
"\n",
"\n",
"假设我们现在又有了站点的坐标数据[air_quality_long](air_quality_long.csv),我们想将其添加到[air_quality_long.csv](air_quality_long.csv)这张表上"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" city | \n",
" country | \n",
" location | \n",
" parameter | \n",
" value | \n",
" unit | \n",
" coordinates.longitude | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Antwerpen | \n",
" BE | \n",
" BETR801 | \n",
" pm25 | \n",
" 18.0 | \n",
" µg/m³ | \n",
" 4.43182 | \n",
"
\n",
" \n",
" 1 | \n",
" Antwerpen | \n",
" BE | \n",
" BETR801 | \n",
" pm25 | \n",
" 6.5 | \n",
" µg/m³ | \n",
" 4.43182 | \n",
"
\n",
" \n",
" 177 | \n",
" London | \n",
" GB | \n",
" London Westminster | \n",
" pm25 | \n",
" 7.0 | \n",
" µg/m³ | \n",
" -0.13193 | \n",
"
\n",
" \n",
" 178 | \n",
" London | \n",
" GB | \n",
" London Westminster | \n",
" pm25 | \n",
" 7.0 | \n",
" µg/m³ | \n",
" -0.13193 | \n",
"
\n",
" \n",
" 1825 | \n",
" Paris | \n",
" FR | \n",
" FR04014 | \n",
" no2 | \n",
" 20.0 | \n",
" µg/m³ | \n",
" 2.39390 | \n",
"
\n",
" \n",
" 1826 | \n",
" Paris | \n",
" FR | \n",
" FR04014 | \n",
" no2 | \n",
" 21.8 | \n",
" µg/m³ | \n",
" 2.39390 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" city country location parameter value unit \\\n",
"0 Antwerpen BE BETR801 pm25 18.0 µg/m³ \n",
"1 Antwerpen BE BETR801 pm25 6.5 µg/m³ \n",
"177 London GB London Westminster pm25 7.0 µg/m³ \n",
"178 London GB London Westminster pm25 7.0 µg/m³ \n",
"1825 Paris FR FR04014 no2 20.0 µg/m³ \n",
"1826 Paris FR FR04014 no2 21.8 µg/m³ \n",
"\n",
" coordinates.longitude \n",
"0 4.43182 \n",
"1 4.43182 \n",
"177 -0.13193 \n",
"178 -0.13193 \n",
"1825 2.39390 \n",
"1826 2.39390 "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"air_quality = pd.read_csv(\"air_quality_long.csv\", index_col=\"date.utc\", parse_dates=True)\n",
"stations_coord = pd.read_csv(\"air_quality_stations.csv\")\n",
"air_quality = pd.merge(air_quality, stations_coord, how=\"left\", on=\"location\")\n",
"air_quality.groupby([\"location\"]).head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 向量化操作\n",
"我们想象这样一个需求,我们需要将泰坦尼克数据集中的所有人名全部改为大写,一般而言,我们会想这样实现"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" BRAUND, MR. OWEN HARRIS | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" CUMINGS, MRS. JOHN BRADLEY (FLORENCE BRIGGS TH... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" HEIKKINEN, MISS. LAINA | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" FUTRELLE, MRS. JACQUES HEATH (LILY MAY PEEL) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" ALLEN, MR. WILLIAM HENRY | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 BRAUND, MR. OWEN HARRIS male 22.0 1 \n",
"1 CUMINGS, MRS. JOHN BRADLEY (FLORENCE BRIGGS TH... female 38.0 1 \n",
"2 HEIKKINEN, MISS. LAINA female 26.0 0 \n",
"3 FUTRELLE, MRS. JACQUES HEATH (LILY MAY PEEL) female 35.0 1 \n",
"4 ALLEN, MR. WILLIAM HENRY male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"\n",
"for i in titanic.iterrows():\n",
" titanic.loc[i[0], \"Name\"] = titanic.loc[i[0], \"Name\"].upper()\n",
"\n",
"titanic.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"这看似是一个高效的方法,但是实际上python中循环效率很低,所以实际上,Pandas为我们提供了更加高效的方法"
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" BRAUND, MR. OWEN HARRIS | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" CUMINGS, MRS. JOHN BRADLEY (FLORENCE BRIGGS TH... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" HEIKKINEN, MISS. LAINA | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" FUTRELLE, MRS. JACQUES HEATH (LILY MAY PEEL) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" ALLEN, MR. WILLIAM HENRY | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 BRAUND, MR. OWEN HARRIS male 22.0 1 \n",
"1 CUMINGS, MRS. JOHN BRADLEY (FLORENCE BRIGGS TH... female 38.0 1 \n",
"2 HEIKKINEN, MISS. LAINA female 26.0 0 \n",
"3 FUTRELLE, MRS. JACQUES HEATH (LILY MAY PEEL) female 35.0 1 \n",
"4 ALLEN, MR. WILLIAM HENRY male 35.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S "
]
},
"execution_count": 30,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"\n",
"titanic[\"Name\"] = titanic[\"Name\"].map(lambda x: x.upper())\n",
"\n",
"titanic.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"两种方法实现了同样的效果,我们来对比一下他们的耗时"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"使用for循环耗时: 90.81252600003609 ms\n",
"使用map方法耗时: 0.3222920001917373 ms\n"
]
}
],
"source": [
"import pandas as pd\n",
"import time\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"\n",
"T1 = time.perf_counter() \n",
"for i in titanic.iterrows():\n",
" titanic.loc[i[0], \"Name\"] = titanic.loc[i[0], \"Name\"].upper()\n",
"T2 = time.perf_counter() \n",
"print(\"使用for循环耗时:\", ((T2 - T1)*1000),\"ms\")\n",
"\n",
"titanic = pd.read_csv(\"titanic.csv\")\n",
"T1 = time.perf_counter() \n",
"titanic[\"Name\"] = titanic[\"Name\"].map(lambda x: x.upper())\n",
"T2 = time.perf_counter() \n",
"print(\"使用map方法耗时:\", ((T2 - T1)*1000),\"ms\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们可以看到,这几乎有200倍的性能提升,在面对大量的数据时,这能为我们节省大量的时间。Pandas中,类似的函数还有`apply`和`applymap`它们适用于不同的使用情景,同学们可以自行查阅。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "JBtest",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}