使用Python进行机器学习-生态系统 (Machine Learning with Python - Ecosystem)

Python简介 (An Introduction to Python)

Python is a popular object-oriented programing language having the capabilities of high-level programming language. Its easy to learn syntax and portability capability makes it popular these days. The followings facts gives us the introduction to Python −

Python是一种流行的面向对象的编程语言,具有高级编程语言的功能。 它易于学习的语法和可移植性功能使其近来很受欢迎。 以下事实为我们提供了Python的介绍-

  • Python was developed by Guido van Rossum at Stichting Mathematisch Centrum in the Netherlands.

    Python由荷兰Stichting Mathematisch Centrum的Guido van Rossum开发。

  • It was written as the successor of programming language named ‘ABC’.

    它被编写为名为“ ABC”的编程语言的后继者。

  • It’s first version was released in 1991.

    它的第一个版本于1991年发布。

  • The name Python was picked by Guido van Rossum from a TV show named Monty Python’s Flying Circus.

    Python是Guido van Rossum在名为Monty Python's Flying Circus的电视节目中选择的。

  • It is an open source programming language which means that we can freely download it and use it to develop programs. It can be downloaded from www.python.

    它是一种开放源代码编程语言,这意味着我们可以免费下载并使用它来开发程序。 可以从www.python下载。

  • Python programming language is having the features of Java and C both. It is having the elegant ‘C’ code and on the other hand, it is having classes and objects like Java for object-oriented programming.

    Python编程语言同时具有Java和C的功能。 它具有优雅的“ C”代码,另一方面,具有诸如Java的类和对象用于面向对象的编程。

  • It is an interpreted language, which means the source code of Python program would be first converted into bytecode and then executed by Python virtual machine.

    它是一种解释型语言,这意味着Python程序的源代码将首先转换为字节码,然后由Python虚拟机执行。

Python的优点和缺点 (Strengths and Weaknesses of Python)

Every programming language has some strengths as well as weaknesses, so does Python too.

每种编程语言都有其优点和缺点,Python也是如此。

长处 (Strengths)

According to studies and surveys, Python is the fifth most important language as well as the most popular language for machine learning and data science. It is because of the following strengths that Python has −

根据研究和调查,Python是机器学习和数据科学中第五重要的语言,也是最受欢迎的语言。 Python具有以下优点:

Easy to learn and understand − The syntax of Python is simpler; hence it is relatively easy, even for beginners also, to learn and understand the language.

易于学习和理解 -Python的语法更简单; 因此,即使对于初学者来说,也相对容易学习和理解该语言。

Multi-purpose language − Python is a multi-purpose programming language because it supports structured programming, object-oriented programming as well as functional programming.

多用途语言 -Python是一种多用途编程语言,因为它支持结构化编程,面向对象的编程以及函数式编程。

Huge number of modules − Python has huge number of modules for covering every aspect of programming. These modules are easily available for use hence making Python an extensible language.

大量的模块 -Python具有大量的模块,可涵盖编程的各个方面。 这些模块易于使用,因此使Python成为可扩展的语言。

Support of open source community − As being open source programming language, Python is supported by a very large developer community. Due to this, the bugs are easily fixed by the Python community. This characteristic makes Python very robust and adaptive.

支持开源社区 -作为开源编程语言,Python得到了非常大的开发人员社区的支持。 因此,Python社区可以轻松修复这些错误。 这个特性使Python非常健壮和自适应。

Scalability − Python is a scalable programming language because it provides an improved structure for supporting large programs than shell-scripts.

可扩展性 -Python是一种可扩展的编程语言,因为它提供了比shell脚本更好的结构来支持大型程序。

弱点 (Weakness)

Although Python is a popular and powerful programming language, it has its own weakness of slow execution speed.

尽管Python是一种流行且功能强大的编程语言,但它也具有执行速度慢的缺点。

The execution speed of Python is slow as compared to compiled languages because Python is an interpreted language. This can be the major area of improvement for Python community.

与Python相比,Python的执行速度较慢,因为Python是一种解释型语言。 这可能是Python社区需要改进的主要领域。

安装Python (Installing Python)

For working in Python, we must first have to install it. You can perform the installation of Python in any of the following two ways −

要使用Python工作,我们必须首先安装它。 您可以通过以下两种方式之一执行Python的安装-

  • Installing Python individually

    单独安装Python

  • Using Pre-packaged Python distribution − Anaconda

    使用预打包的Python发行版-Anaconda

Let us discuss these each in detail.

让我们分别详细讨论这些。

单独安装Python (Installing Python Individually)

If you want to install Python on your computer, then then you need to download only the binary code applicable for your platform. Python distribution is available for Windows, Linux and Mac platforms.

如果要在计算机上安装Python,则只需下载适用于您的平台的二进制代码。 Python发行版适用于Windows,Linux和Mac平台。

The following is a quick overview of installing Python on the above-mentioned platforms −

以下是在上述平台上安装Python的快速概述-

On Unix and Linux platform

在Unix和Linux平台上

With the help of following steps, we can install Python on Unix and Linux platform −

借助以下步骤,我们可以在Unix和Linux平台上安装Python-

  • First, go to www.python/downloads/.

    首先,请访问www.python/downloads/ 。

  • Next, click on the link to download zipped source code available for Unix/Linux.

    接下来,单击链接以下载可用于Unix / Linux的压缩源代码。

  • Now, Download and extract files.

    现在,下载并解压缩文件。

  • Next, we can edit the Modules/Setup file if we want to customize some options.

    接下来,如果要自定义一些选项,我们可以编辑“模块/设置”文件。

    • Next, write the command run ./configure script

      接下来,编写命令run ./configure脚本

    • make

      使

    • make install

      进行安装

On Windows platform

在Windows平台上

With the help of following steps, we can install Python on Windows platform −

借助以下步骤,我们可以在Windows平台上安装Python-

  • First, go to www.python/downloads/.

    首先,请访问www.python/downloads/ 。

  • Next, click on the link for Windows installer python-XYZ.msi file. Here XYZ is the version we wish to install.

    接下来,单击Windows安装程序python-XYZ.msi文件的链接。 XYZ是我们希望安装的版本。

  • Now, we must run the file that is downloaded. It will take us to the Python install wizard, which is easy to use. Now, accept the default settings and wait until the install is finished.

    现在,我们必须运行下载的文件。 它将带我们到易于使用的Python安装向导。 现在,接受默认设置,并等待安装完成。

On Macintosh platform

在Macintosh平台上

For Mac OS X, Homebrew, a great and easy to use package installer is recommended to install Python 3. In case if you don't have Homebrew, you can install it with the help of following command −

对于Mac OS X,建议使用Homebrew易于使用的软件包安装程序来安装Python3。如果没有Homebrew,则可以在以下命令的帮助下进行安装-


$ ruby -e "$(curl -fsSL
https://raw.githubusercontent/Homebrew/install/master/install)"

It can be updated with the command below −

可以使用以下命令进行更新-


$ brew update

Now, to install Python3 on your system, we need to run the following command −

现在,要在您的系统上安装Python3,我们需要运行以下命令-


$ brew install python3

使用预打包的Python发行版:Anaconda (Using Pre-packaged Python Distribution: Anaconda)

Anaconda is a packaged compilation of Python which have all the libraries widely used in Data science. We can follow the following steps to setup Python environment using Anaconda −

Anaconda是Python的打包版本,其中包含所有在数据科学中广泛使用的库。 我们可以按照以下步骤使用Anaconda设置Python环境-

  • Step 1 − First, we need to download the required installation package from Anaconda distribution. The link for the same is www.anaconda/distribution/. You can choose from Windows, Mac and Linux OS as per your requirement.

    步骤1-首先,我们需要从Anaconda发行版下载所需的安装包。 相同的链接是www.anaconda/distribution/ 。 您可以根据需要从Windows,Mac和Linux操作系统中进行选择。

  • Step 2 − Next, select the Python version you want to install on your machine. The latest Python version is 3.7. There you will get the options for 64-bit and 32-bit Graphical installer both.

    步骤2-接下来,选择要在计算机上安装的Python版本。 最新的Python版本是3.7。 在那里,您将同时获得64位和32位图形安装程序的选项。

  • Step 3 − After selecting the OS and Python version, it will download the Anaconda installer on your computer. Now, double click the file and the installer will install Anaconda package.

    步骤3-选择操作系统和Python版本后,它将在您的计算机上下载Anaconda安装程序。 现在,双击该文件,安装程序将安装Anaconda软件包。

  • Step 4 − For checking whether it is installed or not, open a command prompt and type Python as follows −

    步骤4-要检查它是否已安装,请打开命令提示符并按如下所示键入Python-

You can also check this in detailed video lecture at www.tutorialspoint/python_essentials_online_training/getting_started_with_anaconda.asp.

您也可以在www.tutorialspoint/python_essentials_online_training/getting_started_with_anaconda.asp上的详细视频讲座中查看此内容 。

为什么选择Python进行数据科学? (Why Python for Data Science?)

Python is the fifth most important language as well as most popular language for Machine learning and data science. The following are the features of Python that makes it the preferred choice of language for data science −

Python是机器学习和数据科学中第五重要的语言,也是最受欢迎的语言。 以下是Python的功能,使其成为数据科学语言的首选-

整套包装 (Extensive set of packages)

Python has an extensive and powerful set of packages which are ready to be used in various domains. It also has packages like numpy, scipy, pandas, scikit-learn etc. which are required for machine learning and data science.

Python有一套广泛而强大的软件包,可以在各个领域中使用。 它还具有numpy,scipy,pandas,scikit-learn等软件包它们是机器学习和数据科学所需的。

简单的原型制作 (Easy prototyping)

Another important feature of Python that makes it the choice of language for data science is the easy and fast prototyping. This feature is useful for developing new algorithm.

Python的另一个重要特性使它成为数据科学语言的选择,这是简单而快速的原型制作。 此功能对于开发新算法很有用。

协作功能 (Collaboration feature)

The field of data science basically needs good collaboration and Python provides many useful tools that make this extremely.

数据科学领域基本上需要良好的协作,而Python提供了许多非常有用的工具。

一种语言适用于多种领域 (One language for many domains)

A typical data science project includes various domains like data extraction, data manipulation, data analysis, feature extraction, modelling, evaluation, deployment and updating the solution. As Python is a multi-purpose language, it allows the data scientist to address all these domains from a common platform.

一个典型的数据科学项目包括各个领域,例如数据提取,数据处理,数据分析,特征提取,建模,评估,部署和更新解决方案。 由于Python是一种多用途语言,因此它允许数据科学家从一个通用平台访问所有这些领域。

Python ML生态系统的组成部分 (Components of Python ML Ecosystem)

In this section, let us discuss some core Data Science libraries that form the components of Python Machine learning ecosystem. These useful components make Python an important language for Data Science. Though there are many such components, let us discuss some of the importance components of Python ecosystem here −

在本节中,让我们讨论构成Python机器学习生态系统组件的一些核心数据科学库。 这些有用的组件使Python成为数据科学的重要语言。 尽管有很多这样的组件,但让我们在这里讨论Python生态系统的一些重要组件-

Jupyter笔记本 (Jupyter Notebook)

Jupyter notebooks basically provides an interactive computational environment for developing Python based Data Science applications. They are formerly known as ipython notebooks. The following are some of the features of Jupyter notebooks that makes it one of the best components of Python ML ecosystem −

Jupyter笔记本基本上为开发基于Python的Data Science应用程序提供了一个交互式计算环境。 它们以前称为ipython笔记本。 以下是Jupyter笔记本的一些功能,使其成为Python ML生态系统的最佳组件之一-

  • Jupyter notebooks can illustrate the analysis process step by step by arranging the stuff like code, images, text, output etc. in a step by step manner.

    Jupyter笔记本可以通过逐步安排诸如代码,图像,文本,输出等内容来逐步说明分析过程。

  • It helps a data scientist to document the thought process while developing the analysis process.

    它有助于数据科学家在开发分析过程时记录思想过程。

  • One can also capture the result as the part of the notebook.

    人们还可以将结果记录为笔记本的一部分。

  • With the help of jupyter notebooks, we can share our work with a peer also.

    借助jupyter笔记本,我们也可以与同行分享我们的工作。

安装与执行 (Installation and Execution)

If you are using Anaconda distribution, then you need not install jupyter notebook separately as it is already installed with it. You just need to go to Anaconda Prompt and type the following command −

如果您正在使用Anaconda发行版,则无需单独安装jupyter笔记本,因为它已经安装了。 您只需要转到Anaconda Prompt并键入以下命令-


C:\>jupyter notebook

After pressing enter, it will start a notebook server at localhost:8888 of your computer. It is shown in the following screen shot −

按Enter键后,它将在您计算机的localhost:8888处启动一个笔记本服务器。 在以下屏幕截图中显示-

Now, after clicking the New tab, you will get a list of options. Select Python 3 and it will take you to the new notebook for start working in it. You will get a glimpse of it in the following screenshots −

现在,单击“新建”选项卡后,您将获得一个选项列表。 选择Python 3,它将带您进入新笔记本以开始使用它。 您将在以下屏幕截图中瞥见它-

On the other hand, if you are using standard Python distribution then jupyter notebook can be installed using popular python package installer, pip.

另一方面,如果您使用的是标准Python发行版,则可以使用流行的python软件包安装程序pip安装jupyter notebook。


pip install jupyter

Jupyter Notebook中的单元格类型 (Types of Cells in Jupyter Notebook)

The following are the three types of cells in a jupyter notebook −

以下是Jupyter笔记本中的三种单元格类型-

Code cells − As the name suggests, we can use these cells to write code. After writing the code/content, it will send it to the kernel that is associated with the notebook.

代码单元 -顾名思义,我们可以使用这些单元来编写代码。 编写代码/内容后,它将把它发送到与笔记本相关联的内核。

Markdown cells − We can use these cells for notating the computation process. They can contain the stuff like text, images, Latex equations, HTML tags etc.

降价单元 -我们可以使用这些单元来表示计算过程。 它们可以包含文本,图像,Latex公式,HTML标签等内容。

Raw cells − The text written in them is displayed as it is. These cells are basically used to add the text that we do not wish to be converted by the automatic conversion mechanism of jupyter notebook.

原始单元格 -写入其中的文本按原样显示。 这些单元格基本上用于添加我们不希望通过jupyter notebook的自动转换机制转换的文本。

For more detailed study of jupyter notebook, you can go to the link www.tutorialspoint/jupyter/index.htm.

有关Jupyter Notebook的更详细研究,您可以转到链接www.tutorialspoint/jupyter/index.htm 。

NumPy (NumPy)

It is another useful component that makes Python as one of the favorite languages for Data Science. It basically stands for Numerical Python and consists of multidimensional array objects. By using NumPy, we can perform the following important operations −

它是另一个有用的组件,使Python成为数据科学最喜欢的语言之一。 它基本上代表数值Python,由多维数组对象组成。 通过使用NumPy,我们可以执行以下重要操作-

  • Mathematical and logical operations on arrays.

    数组上的数学和逻辑运算。

  • Fourier transformation

    傅立叶变换

  • Operations associated with linear algebra.

    与线性代数相关的运算。

We can also see NumPy as the replacement of MatLab because NumPy is mostly used along with Scipy (Scientific Python) and Mat-plotlib (plotting library).

我们还可以看到NumPy替代了MatLab,因为NumPy通常与Scipy(科学Python)和Mat-plotlib(绘图库)一起使用。

Installation and Execution

安装与执行

If you are using Anaconda distribution, then no need to install NumPy separately as it is already installed with it. You just need to import the package into your Python script with the help of following −

如果使用的是Anaconda发行版,则无需单独安装NumPy,因为它已经安装了。 您只需要在以下帮助下将包导入到您的Python脚本中-


import numpy as np

On the other hand, if you are using standard Python distribution then NumPy can be installed using popular python package installer, pip.

另一方面,如果您使用的是标准Python发行版,则可以使用流行的python软件包安装程序pip安装NumPy。


pip install NumPy

For more detailed study of NumPy, you can go to the link www.tutorialspoint/numpy/index.htm.

有关NumPy的更详细研究,您可以转到链接www.tutorialspoint/numpy/index.htm 。

大熊猫 (Pandas)

It is another useful Python library that makes Python one of the favorite languages for Data Science. Pandas is basically used for data manipulation, wrangling and analysis. It was developed by Wes McKinney in 2008. With the help of Pandas, in data processing we can accomplish the following five steps −

它是另一个有用的Python库,使Python成为数据科学最喜欢的语言之一。 熊猫基本上用于数据处理,整理和分析。 它是由Wes McKinney在2008年开发的。在Pandas的帮助下,在数据处理中,我们可以完成以下五个步骤-

  • Load

    加载
  • Prepare

    准备
  • Manipulate

    操作
  • Model

    模型
  • Analyze

    分析

熊猫中的数据表示 (Data representation in Pandas)

The entire representation of data in Pandas is done with the help of following three data structures −

在以下三种数据结构的帮助下完成了Pandas中数据的完整表示-

Series − It is basically a one-dimensional ndarray with an axis label which means it is like a simple array with homogeneous data. For example, the following series is a collection of integers 1,5,10,15,24,25...

系列 -它基本上是带有轴标签的一维ndarray,这意味着它就像带有均质数据的简单数组。 例如,以下系列是整数1,5,10,15,24,25的集合。

151015242528364089
1个 5 10 15 24 25 28 36 40 89

Data frame − It is the most useful data structure and used for almost all kind of data representation and manipulation in pandas. It is basically a two-dimensional data structure which can contain heterogeneous data. Generally, tabular data is represented by using data frames. For example, the following table shows the data of students having their names and roll numbers, age and gender −

数据框 -这是最有用的数据结构,用于熊猫中几乎所有类型的数据表示和处理。 它基本上是一个二维数据结构,可以包含异构数据。 通常,表格数据是通过使用数据帧表示的。 例如,下表显示了具有姓名和卷号,年龄和性别的学生的数据-

NameRoll numberAgeGender
Aarav115Male
Harshit214Male
Kanika316Female
Mayank415Male
名称 卷号 年龄 性别
阿拉夫 1个 15
哈西特 2 14
卡尼卡 3 16
马扬克 4 15

Panel − It is a 3-dimensional data structure containing heterogeneous data. It is very difficult to represent the panel in graphical representation, but it can be illustrated as a container of DataFrame.

面板 -这是一个包含异构数据的3维数据结构。 用图形表示面板是非常困难的,但是可以将其说明为DataFrame的容器。

The following table gives us the dimension and description about above mentioned data structures used in Pandas −

下表为我们提供了有关熊猫中使用的上述数据结构的维度和说明-

Data StructureDimensionDescription
Series1-DSize immutable, 1-D homogeneous data
DataFrames2-DSize Mutable, Heterogeneous data in tabular form
Panel3-DSize-mutable array, container of DataFrame.
数据结构 尺寸 描述
系列 一维 大小不可变的一维均匀数据
数据框 2维 表格形式的大小可变,异构数据
面板 3维 大小可变的数组,DataFrame的容器。

We can understand these data structures as the higher dimensional data structure is the container of lower dimensional data structure.

我们可以理解这些数据结构,因为高维数据结构是低维数据结构的容器。

安装与执行 (Installation and Execution)

If you are using Anaconda distribution, then no need to install Pandas separately as it is already installed with it. You just need to import the package into your Python script with the help of following −

如果您使用的是Anaconda发行版,则无需单独安装熊猫,因为它已经安装了它。 您只需要在以下帮助下将包导入到您的Python脚本中-


import pandas as pd

On the other hand, if you are using standard Python distribution then Pandas can be installed using popular python package installer, pip.

另一方面,如果您使用的是标准Python发行版,则可以使用流行的python软件包安装程序pip安装Pandas。


pip install Pandas

After installing Pandas, you can import it into your Python script as did above.

安装Pandas之后,您可以像上面一样将其导入到Python脚本中。

(Example)

The following is an example of creating a series from ndarray by using Pandas −

以下是使用Pandas从ndarray创建系列的示例-


In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: data = np.array(['g','a','u','r','a','v'])

In [4]: s = pd.Series(data)

In [5]: print (s)

0 g
1 a
2 u
3 r
4 a
5 v

dtype: object

For more detailed study of Pandas you can go to the link www.tutorialspoint/python_pandas/index.htm.

有关Pandas的详细研究,请访问链接www.tutorialspoint/python_pandas/index.htm 。

Scikit学习 (Scikit-learn)

Another useful and most important python library for Data Science and machine learning in Python is Scikit-learn. The following are some features of Scikit-learn that makes it so useful −

Scikit-learn是用于Python中的数据科学和机器学习的另一个有用且最重要的python库。 以下是Scikit学习的一些功能,使其变得非常有用-

  • It is built on NumPy, SciPy, and Matplotlib.

    它基于NumPy,SciPy和Matplotlib构建。

  • It is an open source and can be reused under BSD license.

    它是开源的,可以在BSD许可下重复使用。

  • It is accessible to everybody and can be reused in various contexts.

    每个人都可以使用它,并且可以在各种环境中重复使用它。

  • Wide range of machine learning algorithms covering major areas of ML like classification, clustering, regression, dimensionality reduction, model selection etc. can be implemented with the help of it.

    借助它,可以实现涵盖机器学习主要领域的广泛机器学习算法,例如分类,聚类,回归,降维,模型选择等。

安装与执行 (Installation and Execution)

If you are using Anaconda distribution, then no need to install Scikit-learn separately as it is already installed with it. You just need to use the package into your Python script. For example, with following line of script we are importing dataset of breast cancer patients from Scikit-learn

如果您使用的是Anaconda发行版,则无需单独安装Scikit-learn,因为它已经安装了它。 您只需要在Python脚本中使用该包即可。 例如,使用以下脚本行,我们从Scikit-learn导入乳腺癌患者的数据集-


from sklearn.datasets import load_breast_cancer

On the other hand, if you are using standard Python distribution and having NumPy and SciPy then Scikit-learn can be installed using popular python package installer, pip.

另一方面,如果您使用标准的Python发行版并具有NumPy和SciPy,则可以使用流行的python软件包安装程序pip安装Scikit-learn。


pip install -U scikit-learn

After installing Scikit-learn, you can use it into your Python script as you have done above.

安装Scikit-learn之后,您可以像上面一样将其用于Python脚本中。

翻译自: https://www.tutorialspoint/machine_learning_with_python/machine_learning_with_python_ecosystem.htm

更多推荐

使用Python进行机器学习-生态系统