Why is there no permutation in Regexes? (Even if regular languages seem to be able to do this)











up vote
9
down vote

favorite
3












The Problem



There is no easy way to get a permutation with a regex.





  • Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.


  • Regex: Regular expression.


For verification:





  • "Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.


  • "How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.


  • "Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.

  • This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".


The kind of solution I am searching for



It should have the form:




  • »aabc« (or anything else you could use a opening and closing parentheses)

  • (aabc)! (similar to (abc)? but with another symbol in the end)

  • [aabc]! (similar to [abc]+ but with another symbol in the end)


Advantages of these solutions



They are:




  • easy

  • adaptable

  • reusable


Why this should exist




  • Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.

  • Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?


So my question is:




  • (Why) Is my proof wrong?

  • If it is right: Why is there no easy way to express permutations?


The proof




  • Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.

  • Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).


Having a finite number of letters I can create this automaton: (Example. Formal: see below)



Grammar that accepts permutations of "abbc":



(sry for numbers on top, maybe someone knows how to make this part looking better)



s -> ah¹



s -> bh²



s -> ch³



h¹ -> bh¹¹



h¹ -> ch¹²



h² -> ah¹¹ (no typo! equivalence)



h² -> bh²²



h² -> ch²³



h³ -> ah¹²



h³ -> bh²³



h¹¹ -> bc



h¹¹ -> cb



h¹² -> bb



h²² -> ac



h²² -> ca



h²³ -> ab



h²³ -> ba



More formal: (using a finite-state-automaton but this could be made with grammar as well)




  • A word q (with finite length) to which any permutation should reach an accepting state.

  • X is the finite alphabet.

  • Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".

  • state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.

  • F is a set of that states that are exact permutations of q.


So it is possible to create a finite-state automaton for accepting permutations of a given word.



Moving on with the proof



So I have proven that regular languages have the power to check for permutations, haven't I?



So why is there no approach to reach this with Regexes? It's a useful functionality.










share|cite|improve this question









New contributor




Asqiir is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 9




    You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
    – Yuval Filmus
    2 days ago






  • 6




    I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
    – Yuval Filmus
    2 days ago












  • The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
    – boboquack
    yesterday








  • 5




    @boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
    – David Richerby
    17 hours ago















up vote
9
down vote

favorite
3












The Problem



There is no easy way to get a permutation with a regex.





  • Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.


  • Regex: Regular expression.


For verification:





  • "Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.


  • "How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.


  • "Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.

  • This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".


The kind of solution I am searching for



It should have the form:




  • »aabc« (or anything else you could use a opening and closing parentheses)

  • (aabc)! (similar to (abc)? but with another symbol in the end)

  • [aabc]! (similar to [abc]+ but with another symbol in the end)


Advantages of these solutions



They are:




  • easy

  • adaptable

  • reusable


Why this should exist




  • Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.

  • Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?


So my question is:




  • (Why) Is my proof wrong?

  • If it is right: Why is there no easy way to express permutations?


The proof




  • Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.

  • Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).


Having a finite number of letters I can create this automaton: (Example. Formal: see below)



Grammar that accepts permutations of "abbc":



(sry for numbers on top, maybe someone knows how to make this part looking better)



s -> ah¹



s -> bh²



s -> ch³



h¹ -> bh¹¹



h¹ -> ch¹²



h² -> ah¹¹ (no typo! equivalence)



h² -> bh²²



h² -> ch²³



h³ -> ah¹²



h³ -> bh²³



h¹¹ -> bc



h¹¹ -> cb



h¹² -> bb



h²² -> ac



h²² -> ca



h²³ -> ab



h²³ -> ba



More formal: (using a finite-state-automaton but this could be made with grammar as well)




  • A word q (with finite length) to which any permutation should reach an accepting state.

  • X is the finite alphabet.

  • Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".

  • state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.

  • F is a set of that states that are exact permutations of q.


So it is possible to create a finite-state automaton for accepting permutations of a given word.



Moving on with the proof



So I have proven that regular languages have the power to check for permutations, haven't I?



So why is there no approach to reach this with Regexes? It's a useful functionality.










share|cite|improve this question









New contributor




Asqiir is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 9




    You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
    – Yuval Filmus
    2 days ago






  • 6




    I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
    – Yuval Filmus
    2 days ago












  • The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
    – boboquack
    yesterday








  • 5




    @boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
    – David Richerby
    17 hours ago













up vote
9
down vote

favorite
3









up vote
9
down vote

favorite
3






3





The Problem



There is no easy way to get a permutation with a regex.





  • Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.


  • Regex: Regular expression.


For verification:





  • "Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.


  • "How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.


  • "Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.

  • This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".


The kind of solution I am searching for



It should have the form:




  • »aabc« (or anything else you could use a opening and closing parentheses)

  • (aabc)! (similar to (abc)? but with another symbol in the end)

  • [aabc]! (similar to [abc]+ but with another symbol in the end)


Advantages of these solutions



They are:




  • easy

  • adaptable

  • reusable


Why this should exist




  • Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.

  • Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?


So my question is:




  • (Why) Is my proof wrong?

  • If it is right: Why is there no easy way to express permutations?


The proof




  • Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.

  • Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).


Having a finite number of letters I can create this automaton: (Example. Formal: see below)



Grammar that accepts permutations of "abbc":



(sry for numbers on top, maybe someone knows how to make this part looking better)



s -> ah¹



s -> bh²



s -> ch³



h¹ -> bh¹¹



h¹ -> ch¹²



h² -> ah¹¹ (no typo! equivalence)



h² -> bh²²



h² -> ch²³



h³ -> ah¹²



h³ -> bh²³



h¹¹ -> bc



h¹¹ -> cb



h¹² -> bb



h²² -> ac



h²² -> ca



h²³ -> ab



h²³ -> ba



More formal: (using a finite-state-automaton but this could be made with grammar as well)




  • A word q (with finite length) to which any permutation should reach an accepting state.

  • X is the finite alphabet.

  • Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".

  • state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.

  • F is a set of that states that are exact permutations of q.


So it is possible to create a finite-state automaton for accepting permutations of a given word.



Moving on with the proof



So I have proven that regular languages have the power to check for permutations, haven't I?



So why is there no approach to reach this with Regexes? It's a useful functionality.










share|cite|improve this question









New contributor




Asqiir is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











The Problem



There is no easy way to get a permutation with a regex.





  • Permutation: Getting a word $$w=x_1…x_n$$ ("aabc") to another order, without changing number or kind of letters.


  • Regex: Regular expression.


For verification:





  • "Regex permutations without repetition" The answer creates JavaScript code instead of a regex, assuming this would be more simple.


  • "How to find all permutations of a given word in a given text" – The answer doesn't use regexes either.


  • "Regex to match all {1, 2, 3, 4} without repetition" – The answer uses regexes, but it's neither adaptable nor simple.

  • This answer even claims: "A regular expression cannot do what you're asking for. It cannot generate permutations from a string".


The kind of solution I am searching for



It should have the form:




  • »aabc« (or anything else you could use a opening and closing parentheses)

  • (aabc)! (similar to (abc)? but with another symbol in the end)

  • [aabc]! (similar to [abc]+ but with another symbol in the end)


Advantages of these solutions



They are:




  • easy

  • adaptable

  • reusable


Why this should exist




  • Regexes are a way to describe a grammar of a regular language. They have the full power to be any kind of regular language.

  • Let's say, regular languages are powerful enough for permutations (proof below) – why is there no easy way to express this?


So my question is:




  • (Why) Is my proof wrong?

  • If it is right: Why is there no easy way to express permutations?


The proof




  • Regular expressions are one way to note the grammar of a regular language. They can describe any regular languages grammar.

  • Another way to describe any regular languages (that have a finite number of letters within their alphabet) grammar are non-deterministic Automatons (with a finite number of states).


Having a finite number of letters I can create this automaton: (Example. Formal: see below)



Grammar that accepts permutations of "abbc":



(sry for numbers on top, maybe someone knows how to make this part looking better)



s -> ah¹



s -> bh²



s -> ch³



h¹ -> bh¹¹



h¹ -> ch¹²



h² -> ah¹¹ (no typo! equivalence)



h² -> bh²²



h² -> ch²³



h³ -> ah¹²



h³ -> bh²³



h¹¹ -> bc



h¹¹ -> cb



h¹² -> bb



h²² -> ac



h²² -> ca



h²³ -> ab



h²³ -> ba



More formal: (using a finite-state-automaton but this could be made with grammar as well)




  • A word q (with finite length) to which any permutation should reach an accepting state.

  • X is the finite alphabet.

  • Set of states S contains any order of letters up to the length of q. (So the size of S is finite.) Plus one state of "any longer word".

  • state transition function d which takes a letter and moves on the state that corresponds to the now read part of the word.

  • F is a set of that states that are exact permutations of q.


So it is possible to create a finite-state automaton for accepting permutations of a given word.



Moving on with the proof



So I have proven that regular languages have the power to check for permutations, haven't I?



So why is there no approach to reach this with Regexes? It's a useful functionality.







formal-languages regular-languages regular-expressions






share|cite|improve this question









New contributor




Asqiir is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




Asqiir is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited yesterday









Glorfindel

1741110




1741110






New contributor




Asqiir is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









Asqiir

1518




1518




New contributor




Asqiir is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Asqiir is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Asqiir is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 9




    You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
    – Yuval Filmus
    2 days ago






  • 6




    I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
    – Yuval Filmus
    2 days ago












  • The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
    – boboquack
    yesterday








  • 5




    @boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
    – David Richerby
    17 hours ago














  • 9




    You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
    – Yuval Filmus
    2 days ago






  • 6




    I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
    – Yuval Filmus
    2 days ago












  • The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
    – boboquack
    yesterday








  • 5




    @boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
    – David Richerby
    17 hours ago








9




9




You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
– Yuval Filmus
2 days ago




You can list all permutations of your word with a regular expression. The resulting expression will be quite large, but will definitely be a regular expression.
– Yuval Filmus
2 days ago




6




6




I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
– Yuval Filmus
2 days ago






I suggest ignoring all answers about Theory of Computation on stackoverflow. This isn't that site's specialty.
– Yuval Filmus
2 days ago














The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
– boboquack
yesterday






The answer on your linked page here - stackoverflow.com/a/3102205/6936386 - seems to be easily adaptable and not too complicated: ^(a()|a()|b()|c()){4}2345$ seems to work (see regex101.com/r/9URPpg/4/tests).
– boboquack
yesterday






5




5




@boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
– David Richerby
17 hours ago




@boboquack That isn't a regular expression in the sense in which the term is used in computer science. (This sort of thing is exactly why Yuval suggests not trusting Stack Overflow answers about theoretical CS.)
– David Richerby
17 hours ago










3 Answers
3






active

oldest

votes

















up vote
31
down vote



accepted










The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.



Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.



So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.






share|cite|improve this answer























  • But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
    – Asqiir
    yesterday








  • 7




    @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
    – David Richerby
    yesterday






  • 3




    @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
    – David Richerby
    yesterday






  • 1




    You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
    – Asqiir
    yesterday






  • 1




    Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
    – reinierpost
    17 hours ago




















up vote
13
down vote














So my question is:




  • (Why) Is my proof wrong?

  • If it is right: Why is there no easy way to express permutations?




Your "proof" only looked at permutations of single words, which are finite languages.



Every finite language is regular (e.g. just by listing all of the members with a | inbetween), but there are infinite regular languages (and those are generally the more interesting ones).



As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the * operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).



The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.






share|cite|improve this answer




























    up vote
    7
    down vote













    Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).



    So in some sense, there is no succinct way to specify all permutations of a word.





    Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.



    We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:





    • $x_iy_i in L$.

    • If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.


    Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.



    Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.






    share|cite|improve this answer























    • Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
      – Asqiir
      2 days ago






    • 1




      Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
      – Yuval Filmus
      2 days ago






    • 1




      What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
      – Asqiir
      2 days ago






    • 1




      Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
      – Yuval Filmus
      2 days ago










    • I added a simple argument using the fooling set method.
      – Yuval Filmus
      2 days ago











    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "419"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    Asqiir is a new contributor. Be nice, and check out our Code of Conduct.










     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f100206%2fwhy-is-there-no-permutation-in-regexes-even-if-regular-languages-seem-to-be-ab%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    31
    down vote



    accepted










    The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.



    Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.



    So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.






    share|cite|improve this answer























    • But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
      – Asqiir
      yesterday








    • 7




      @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
      – David Richerby
      yesterday






    • 3




      @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
      – David Richerby
      yesterday






    • 1




      You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
      – Asqiir
      yesterday






    • 1




      Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
      – reinierpost
      17 hours ago

















    up vote
    31
    down vote



    accepted










    The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.



    Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.



    So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.






    share|cite|improve this answer























    • But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
      – Asqiir
      yesterday








    • 7




      @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
      – David Richerby
      yesterday






    • 3




      @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
      – David Richerby
      yesterday






    • 1




      You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
      – Asqiir
      yesterday






    • 1




      Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
      – reinierpost
      17 hours ago















    up vote
    31
    down vote



    accepted







    up vote
    31
    down vote



    accepted






    The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.



    Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.



    So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.






    share|cite|improve this answer














    The fundamental theorems of formal language theory are that regular expressions, regular grammars, deterministic finite automata (DFAs) and nondeterministic finite automata (NFAs) all describe the same kinds of languages: namely the regular languages. The fact that we can describe these languages in so many completely different ways suggests that there's something natural and important about these languages, in the same way that the equivalence of Turing machines, the lambda calculus and all kinds of other things suggests that the computable languages are natural and important. They're not just an artifact of whatever random decisions the original discoverer made.



    Suppose we add a new rule for creating regular expressions: if $R$ is a regular expression, then $pi(R)$ is a regular expression, and it matches every permutation of every string matched by $R$. So, for example, $L(pi(abc)) = {abc, acb, bac, bca, cab, cba}$. The problem is that this breaks the fundamental equivalences described above. $Lbig(pi((ab)^*))big)$ is the language of strings that contain an equal number of $a$s and $b$s and this isn't a regular language. Compare this with, for example, adding a negation or reversal operator to regular expressions, which doesn't change the class of languages that are accepted.



    So, to answer the title question, regular expressions can't do permutations and we don't add that ability because then regular expressions wouldn't match regular languages. Having said that, it's possible that "regular expressions with permutations" would also be an interesting class of languages with lots of different characterizations.







    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited 2 hours ago









    Laurel

    1034




    1034










    answered 2 days ago









    David Richerby

    64.3k1597185




    64.3k1597185












    • But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
      – Asqiir
      yesterday








    • 7




      @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
      – David Richerby
      yesterday






    • 3




      @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
      – David Richerby
      yesterday






    • 1




      You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
      – Asqiir
      yesterday






    • 1




      Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
      – reinierpost
      17 hours ago




















    • But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
      – Asqiir
      yesterday








    • 7




      @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
      – David Richerby
      yesterday






    • 3




      @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
      – David Richerby
      yesterday






    • 1




      You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
      – Asqiir
      yesterday






    • 1




      Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
      – reinierpost
      17 hours ago


















    But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
    – Asqiir
    yesterday






    But L((ab)*) is no regular language either – so L(perm((ab)*)) can't be one. ( (ab)* is no regular language since there is no kind of memory to remember how many opening "a"s there are, so with a finite number of states you cannot put the same number of "b"s.)
    – Asqiir
    yesterday






    7




    7




    @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
    – David Richerby
    yesterday




    @Asqiir $L((ab)^*)$ is regular because it's the language matched by the given regular expression. You seem to have misunderstood what the language is. It's ${varepsilon, ab, abab, ababab, abababab, dots}$, not ${varepsilon,ab,aabb,aaabbb,aaaabbbb,dots}$. The latter language isn't regular but that's not the language we're talking about.
    – David Richerby
    yesterday




    3




    3




    @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
    – David Richerby
    yesterday




    @Asqiir It is that language because it's what you get by applying any permutation you want to a language in which every string contained the same number of $a$s and $b$s; it's not regular because you can prove that using the pumping lemma.
    – David Richerby
    yesterday




    1




    1




    You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
    – Asqiir
    yesterday




    You are completely right. I missed the point of "putting regular expressions into each other", I only thought about "permutating a fixed word" not "permutating another regex" which of course isn't possible.
    – Asqiir
    yesterday




    1




    1




    Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
    – reinierpost
    17 hours ago






    Perhaps regular expressions with permutations describe a class of languages with interesting properties, but I've never run into need for the ! operator in practice, and I suppose few people have, as it's easy to implement, and no implementation of extended regular expressions I've seen supports it.
    – reinierpost
    17 hours ago












    up vote
    13
    down vote














    So my question is:




    • (Why) Is my proof wrong?

    • If it is right: Why is there no easy way to express permutations?




    Your "proof" only looked at permutations of single words, which are finite languages.



    Every finite language is regular (e.g. just by listing all of the members with a | inbetween), but there are infinite regular languages (and those are generally the more interesting ones).



    As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the * operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).



    The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.






    share|cite|improve this answer

























      up vote
      13
      down vote














      So my question is:




      • (Why) Is my proof wrong?

      • If it is right: Why is there no easy way to express permutations?




      Your "proof" only looked at permutations of single words, which are finite languages.



      Every finite language is regular (e.g. just by listing all of the members with a | inbetween), but there are infinite regular languages (and those are generally the more interesting ones).



      As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the * operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).



      The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.






      share|cite|improve this answer























        up vote
        13
        down vote










        up vote
        13
        down vote










        So my question is:




        • (Why) Is my proof wrong?

        • If it is right: Why is there no easy way to express permutations?




        Your "proof" only looked at permutations of single words, which are finite languages.



        Every finite language is regular (e.g. just by listing all of the members with a | inbetween), but there are infinite regular languages (and those are generally the more interesting ones).



        As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the * operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).



        The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.






        share|cite|improve this answer













        So my question is:




        • (Why) Is my proof wrong?

        • If it is right: Why is there no easy way to express permutations?




        Your "proof" only looked at permutations of single words, which are finite languages.



        Every finite language is regular (e.g. just by listing all of the members with a | inbetween), but there are infinite regular languages (and those are generally the more interesting ones).



        As soon as you get a regular expression (or grammar/automaton) which accepts an infinite language (i.e. an expression with the * operator, or an automaton with a loop), your construction doesn't work anymore (you get an infinite grammar/automaton).



        The answer by David Richerby provided an example of a regular language whose permutation language is not regular anymore – all such examples are infinite languages.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered yesterday









        Paŭlo Ebermann

        47329




        47329






















            up vote
            7
            down vote













            Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).



            So in some sense, there is no succinct way to specify all permutations of a word.





            Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.



            We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:





            • $x_iy_i in L$.

            • If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.


            Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.



            Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.






            share|cite|improve this answer























            • Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
              – Asqiir
              2 days ago






            • 1




              Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
              – Yuval Filmus
              2 days ago






            • 1




              What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
              – Asqiir
              2 days ago






            • 1




              Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
              – Yuval Filmus
              2 days ago










            • I added a simple argument using the fooling set method.
              – Yuval Filmus
              2 days ago















            up vote
            7
            down vote













            Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).



            So in some sense, there is no succinct way to specify all permutations of a word.





            Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.



            We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:





            • $x_iy_i in L$.

            • If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.


            Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.



            Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.






            share|cite|improve this answer























            • Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
              – Asqiir
              2 days ago






            • 1




              Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
              – Yuval Filmus
              2 days ago






            • 1




              What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
              – Asqiir
              2 days ago






            • 1




              Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
              – Yuval Filmus
              2 days ago










            • I added a simple argument using the fooling set method.
              – Yuval Filmus
              2 days ago













            up vote
            7
            down vote










            up vote
            7
            down vote









            Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).



            So in some sense, there is no succinct way to specify all permutations of a word.





            Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.



            We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:





            • $x_iy_i in L$.

            • If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.


            Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.



            Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.






            share|cite|improve this answer














            Let $Sigma$ be an alphabet of size $n$. A regular expression describing all permutations of $Sigma$ must have exponential size. This follows from Theorem 9 in Lower bounds for context-free grammars, which gives an exponential lower bound on the much stronger model of context-free grammars (a regular expression of size $m$ can be converted to a context-free grammar of size $O(m)$).



            So in some sense, there is no succinct way to specify all permutations of a word.





            Here is a simple proof for a $tildeOmega(2^n)$ lower bound on the size of a regular expression for the language of all permutations of an alphabet $Sigma$ of size $n$. Since a regular expression of size $m$ can be converted to an NFA with $O(m)$ states, it suffices to give a lower bound on NFAs.



            We will use the fooling set method, which we now describe. Let $L$ be a language, and let $(x_i,y_i)_{1 leq i leq N}$ be a collection of pairs of words such that:





            • $x_iy_i in L$.

            • If $i neq j$ then either $x_i y_j notin L$ or $x_j y_i notin L$.


            Then every NFA for $L$ contains at least $N$ states. Indeed, consider any NFA for $L$. For each $i$, consider some accepting computation of $x_iy_i$, and let $q_i$ be the state that the NFA is in after reading $x_i$ in this accepting computation. I claim that $q_i neq q_j$ for $i neq j$. Indeed, if $q_i = q_j$ then a "cut-and-paste" argument shows that both $x_iy_j$ and $x_jy_i$ are in $L$, contrary to assumption.



            Now we can prove the lower bound on NFAs for the language $L_n$ of all permutations of the symbols $sigma_1,ldots,sigma_n$; we assume for simplicity of exposition that $n$ is even. For each subset $S$ of $sigma_1,ldots,sigma_n$ of size $n/2$, let $x_S$ consist of all symbols in $S$ in some arbitrary order, and let $y_S$ consist of all symbols not in $S$ in some arbitrary order. Clearly $x_Sy_S in L_n$. Conversely, if $S neq T$ then $x_S y_T notin L_n$. This shows that every NFA for $L_n$ contains at least $binom{n}{n/2} = Omega(2^n/sqrt{n})$ many states.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited 2 days ago

























            answered 2 days ago









            Yuval Filmus

            187k12176337




            187k12176337












            • Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
              – Asqiir
              2 days ago






            • 1




              Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
              – Yuval Filmus
              2 days ago






            • 1




              What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
              – Asqiir
              2 days ago






            • 1




              Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
              – Yuval Filmus
              2 days ago










            • I added a simple argument using the fooling set method.
              – Yuval Filmus
              2 days ago


















            • Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
              – Asqiir
              2 days ago






            • 1




              Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
              – Yuval Filmus
              2 days ago






            • 1




              What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
              – Asqiir
              2 days ago






            • 1




              Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
              – Yuval Filmus
              2 days ago










            • I added a simple argument using the fooling set method.
              – Yuval Filmus
              2 days ago
















            Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
            – Asqiir
            2 days ago




            Does this mean 1) in theory it would be possible to let »abc« match all {abc, acb, bac, bca, cab, cba} but it's just not efficient and would make them too slow since »abc« would expand exponentially to (abc|acb|bac|bca|cab|cba)? or 2) The kind of automaton I need is not able to specify all permutations for a given word?
            – Asqiir
            2 days ago




            1




            1




            Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
            – Yuval Filmus
            2 days ago




            Here is a regular expression that matches all permutations of $abc$: $abc+acd+bac+bca+cab+cba$. You can easily convert this to a DFA having $1+3+6+6+1 = 17$ states. The same works for $abcdefghij$.
            – Yuval Filmus
            2 days ago




            1




            1




            What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
            – Asqiir
            2 days ago




            What I understood: In theory, regular languages are able to accept permutations (so are regular expressions). There is just no "simple way" to write "permutation of abc" like »abc«. (For whatever reasons.)
            – Asqiir
            2 days ago




            1




            1




            Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
            – Yuval Filmus
            2 days ago




            Yes, that's a good summary. I'll see if I can come up with a simpler argument for regular expressions.
            – Yuval Filmus
            2 days ago












            I added a simple argument using the fooling set method.
            – Yuval Filmus
            2 days ago




            I added a simple argument using the fooling set method.
            – Yuval Filmus
            2 days ago










            Asqiir is a new contributor. Be nice, and check out our Code of Conduct.










             

            draft saved


            draft discarded


















            Asqiir is a new contributor. Be nice, and check out our Code of Conduct.













            Asqiir is a new contributor. Be nice, and check out our Code of Conduct.












            Asqiir is a new contributor. Be nice, and check out our Code of Conduct.















             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcs.stackexchange.com%2fquestions%2f100206%2fwhy-is-there-no-permutation-in-regexes-even-if-regular-languages-seem-to-be-ab%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Volksrepublik China

            How to test boost logger output in unit testing?

            Write to the output between two pipeline