本文共 517 字,大约阅读时间需要 1 分钟。
>>> v=sc.parallelize(["one", "two", "two", "three", "three", "three"])>>> v2=v.map(lambda x: (x,1))>>> v2.collect()[('one', 1), ('two', 1), ('two', 1), ('three', 1), ('three', 1), ('three', 1)] >>> v3=v2.groupByKey()>>> v3.collect()[('one',), ('two', ), ('three', )]>>> v4=v3.filter(lambda x:len(x[1].data)>2)>>> v4.collect()[('three', )]
过滤了出现次数大于2的结果
本文转自张昺华-sky博客园博客,原文链接:http://www.cnblogs.com/bonelee/p/7764934.html,如需转载请自行联系原作者