代碼如下,步驟流程在代碼注釋中可見:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
# -*- coding: utf-8 -*- import pandas as pd from pyspark.sql import SparkSession from pyspark.sql import SQLContext from pyspark import SparkContext #初始化數據 #初始化pandas DataFrame df = pd.DataFrame([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ]], index = [ 'row1' , 'row2' ], columns = [ 'c1' , 'c2' , 'c3' ]) #打印數據 print df #初始化spark DataFrame sc = SparkContext() if __name__ = = "__main__" : spark = SparkSession\ .builder\ .appName( "testDataFrame" )\ .getOrCreate() sentenceData = spark.createDataFrame([ ( 0.0 , "I like Spark" ), ( 1.0 , "Pandas is useful" ), ( 2.0 , "They are coded by Python " ) ], [ "label" , "sentence" ]) #顯示數據 sentenceData.select( "label" ).show() #spark.DataFrame 轉換成 pandas.DataFrame sqlContest = SQLContext(sc) spark_df = sqlContest.createDataFrame(df) #顯示數據 spark_df.select( "c1" ).show() # pandas.DataFrame 轉換成 spark.DataFrame pandas_df = sentenceData.toPandas() #打印數據 print pandas_df |
程序結果:
以上這篇pyspark.sql.DataFrame與pandas.DataFrame之間的相互轉換實例就是小編分享給大家的全部內容了,希望能給大家一個參考,也希望大家多多支持服務器之家。
原文鏈接:https://blog.csdn.net/zhurui_idea/article/details/72981715