Apache Spark ML Pipeline: filter empty rows in dataset
up vote
0
down vote
favorite
In my Spark ML Pipeline (Spark 2.3.0) I use RegexTokenizer like this:
val regexTokenizer = new RegexTokenizer()
.setInputCol("text")
.setOutputCol("words")
.setMinTokenLength(3)
It transforms DataFrame to the one with arrays of words, for example:
text | words
-------------------------
a the | [the]
a of to |
big small | [big,small]
How to filter rows with empty arrays?
Should I create custom transformer and pass it to pipeline?
scala apache-spark-mllib apache-spark-ml
add a comment |
up vote
0
down vote
favorite
In my Spark ML Pipeline (Spark 2.3.0) I use RegexTokenizer like this:
val regexTokenizer = new RegexTokenizer()
.setInputCol("text")
.setOutputCol("words")
.setMinTokenLength(3)
It transforms DataFrame to the one with arrays of words, for example:
text | words
-------------------------
a the | [the]
a of to |
big small | [big,small]
How to filter rows with empty arrays?
Should I create custom transformer and pass it to pipeline?
scala apache-spark-mllib apache-spark-ml
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
In my Spark ML Pipeline (Spark 2.3.0) I use RegexTokenizer like this:
val regexTokenizer = new RegexTokenizer()
.setInputCol("text")
.setOutputCol("words")
.setMinTokenLength(3)
It transforms DataFrame to the one with arrays of words, for example:
text | words
-------------------------
a the | [the]
a of to |
big small | [big,small]
How to filter rows with empty arrays?
Should I create custom transformer and pass it to pipeline?
scala apache-spark-mllib apache-spark-ml
In my Spark ML Pipeline (Spark 2.3.0) I use RegexTokenizer like this:
val regexTokenizer = new RegexTokenizer()
.setInputCol("text")
.setOutputCol("words")
.setMinTokenLength(3)
It transforms DataFrame to the one with arrays of words, for example:
text | words
-------------------------
a the | [the]
a of to |
big small | [big,small]
How to filter rows with empty arrays?
Should I create custom transformer and pass it to pipeline?
scala apache-spark-mllib apache-spark-ml
scala apache-spark-mllib apache-spark-ml
asked 22 hours ago
Igorock
93841229
93841229
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
df
.select($text, $words)
.where(size($words) > 0)
1
I cannot do this because I use pipeline. I can create custom transformer to wrap this code but I want to know if there are another options
– Igorock
22 hours ago
While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
– Nic3500
13 hours ago
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
df
.select($text, $words)
.where(size($words) > 0)
1
I cannot do this because I use pipeline. I can create custom transformer to wrap this code but I want to know if there are another options
– Igorock
22 hours ago
While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
– Nic3500
13 hours ago
add a comment |
up vote
0
down vote
df
.select($text, $words)
.where(size($words) > 0)
1
I cannot do this because I use pipeline. I can create custom transformer to wrap this code but I want to know if there are another options
– Igorock
22 hours ago
While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
– Nic3500
13 hours ago
add a comment |
up vote
0
down vote
up vote
0
down vote
df
.select($text, $words)
.where(size($words) > 0)
df
.select($text, $words)
.where(size($words) > 0)
answered 22 hours ago
Terry Dactyl
1,091412
1,091412
1
I cannot do this because I use pipeline. I can create custom transformer to wrap this code but I want to know if there are another options
– Igorock
22 hours ago
While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
– Nic3500
13 hours ago
add a comment |
1
I cannot do this because I use pipeline. I can create custom transformer to wrap this code but I want to know if there are another options
– Igorock
22 hours ago
While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
– Nic3500
13 hours ago
1
1
I cannot do this because I use pipeline. I can create custom transformer to wrap this code but I want to know if there are another options
– Igorock
22 hours ago
I cannot do this because I use pipeline. I can create custom transformer to wrap this code but I want to know if there are another options
– Igorock
22 hours ago
While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
– Nic3500
13 hours ago
While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value.
– Nic3500
13 hours ago
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53371039%2fapache-spark-ml-pipeline-filter-empty-rows-in-dataset%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown