To get unique values in a column of a PySpark DataFrame, we can use the `distinct()` method. There are different ways to use `distinct()` depending on whether we want to get unique values from a single column or multiple columns [1][2][4][5][6].
Here is an example of how to get unique values from a single column:
```python
from pyspark.sql.functions import col
# create a sample DataFrame
data = [("Alice", 25), ("Bob", 30), ("Alice", 35), ("Charlie", 40)]
df = spark.createDataFrame(data, ["Name", "Age"])
# get unique values from the "Name" column
unique_names = df.select(col("Name")).distinct().rdd.flatMap(lambda x: x).collect()
print(unique_names)
```
In this example, we first create a sample DataFrame with two columns "Name" and "Age". We then use `select()` to select only the "Name" column, followed by `distinct()` to get the unique values in that column. Finally, we use `rdd.flatMap(lambda x: x).collect()` to convert the result to a list of strings.
Here is an example of how to get unique values from multiple columns:
```python
# create a sample DataFrame
data = [("Alice", 25), ("Bob", 30), ("Alice", 35), ("Charlie", 40)]
df = spark.createDataFrame(data, ["Name", "Age"])
# get unique values from the "Name" and "Age" columns
unique_values = df.select([col(c) for c in df.columns]).distinct().rdd.flatMap(lambda x: x).collect()
print(unique_values)
```
In this example, we use a list comprehension to select all columns in the DataFrame, followed by `distinct()` to get the unique values in all columns. Finally, we use `rdd.flatMap(lambda x: x).collect()` to convert the result to a list of tuples.
Note that `distinct()` returns a new DataFrame with only the distinct rows. If we want to get the unique values as a list, we need to use `rdd.flatMap(lambda x: x).collect()` as shown in the examples above.