sc._jsc.sc() is the right way to access the underlying SparkContext. To illustrate: >>> sc._jsc.sc() JavaObject id=o27 >>> sc._jsc.sc().version() u'1.1.0' >>> sc._jsc.sc().defaultMinSplits() 2 The problem that you're seeing here is that Py4J's help command has trouble displaying the help for this class (possibly a Py4J bug)....
So I've got an example of this in a branch that I'm working on for Sparkling Pandas The branch lives at https://github.com/holdenk/sparklingpandas/tree/add-kurtosis-support and the PR is at https://github.com/sparklingpandas/sparklingpandas/pull/90 . As it stands it looks like you have two different gateway servers which seems like it might cause some problems, instead...
Py4J does not generally require type casting because of its heavy use of reflection. If you call a method on an object, Py4J will use the JVM reflection facilities to find this method in the class hierarchy of the object no matter what the advertised type of the object is....
You can add external jars as arguments to pyspark pyspark --jars file1.jar,file2.jar ...
You can call a Python method from Java by implementing a Java interface on the python side. The steps are: Create an interface in Java, e.g., py4j.examples.Operator In Python, create a class and inside the class, create a Java class with an "implements" field. In Python, instantiate a gateway with...
There is no public API to retrieve all imported classes, but you can open a feature request. In the meantime, you can use the internal API, which may change in the future, but this part of the internal API has been stable since early releases: from py4j.java_gateway import java_import, JavaGateway...