Regular expression with iterator











up vote
-1
down vote

favorite












I am trying to scrape prices from online commerce store. I am iterating through the products on page and included it in the regular expression. Despite escaping the curly brackets, the regular expression does not work. (findall returns an empty list)



HTML code returned by soup.findall:



[<div class="ps4-price at-min-price-1"> from 29 GBP </div>]
[<div class="ps4-price at-min-price-2"> from 35 GBP </div>]


Python coode:



for product in range(21):

min_prices_text = str(soup.findAll("div", class_="ps4-price at-
min- price-{}".format(product)))

min_price = re.findall('<div class="ps4-price at-min-price-
{{}}"> (.+?)<'.format(product), str(min_prices_text))









share|improve this question







New contributor




dzakob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))
    – Wiktor Stribiżew
    2 days ago








  • 1




    Maybe don't use regex to parse HTML content.
    – Tim Biegeleisen
    2 days ago










  • BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.
    – Wiktor Stribiżew
    2 days ago















up vote
-1
down vote

favorite












I am trying to scrape prices from online commerce store. I am iterating through the products on page and included it in the regular expression. Despite escaping the curly brackets, the regular expression does not work. (findall returns an empty list)



HTML code returned by soup.findall:



[<div class="ps4-price at-min-price-1"> from 29 GBP </div>]
[<div class="ps4-price at-min-price-2"> from 35 GBP </div>]


Python coode:



for product in range(21):

min_prices_text = str(soup.findAll("div", class_="ps4-price at-
min- price-{}".format(product)))

min_price = re.findall('<div class="ps4-price at-min-price-
{{}}"> (.+?)<'.format(product), str(min_prices_text))









share|improve this question







New contributor




dzakob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))
    – Wiktor Stribiżew
    2 days ago








  • 1




    Maybe don't use regex to parse HTML content.
    – Tim Biegeleisen
    2 days ago










  • BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.
    – Wiktor Stribiżew
    2 days ago













up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I am trying to scrape prices from online commerce store. I am iterating through the products on page and included it in the regular expression. Despite escaping the curly brackets, the regular expression does not work. (findall returns an empty list)



HTML code returned by soup.findall:



[<div class="ps4-price at-min-price-1"> from 29 GBP </div>]
[<div class="ps4-price at-min-price-2"> from 35 GBP </div>]


Python coode:



for product in range(21):

min_prices_text = str(soup.findAll("div", class_="ps4-price at-
min- price-{}".format(product)))

min_price = re.findall('<div class="ps4-price at-min-price-
{{}}"> (.+?)<'.format(product), str(min_prices_text))









share|improve this question







New contributor




dzakob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I am trying to scrape prices from online commerce store. I am iterating through the products on page and included it in the regular expression. Despite escaping the curly brackets, the regular expression does not work. (findall returns an empty list)



HTML code returned by soup.findall:



[<div class="ps4-price at-min-price-1"> from 29 GBP </div>]
[<div class="ps4-price at-min-price-2"> from 35 GBP </div>]


Python coode:



for product in range(21):

min_prices_text = str(soup.findAll("div", class_="ps4-price at-
min- price-{}".format(product)))

min_price = re.findall('<div class="ps4-price at-min-price-
{{}}"> (.+?)<'.format(product), str(min_prices_text))






regex python-3.x






share|improve this question







New contributor




dzakob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




dzakob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




dzakob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









dzakob

11




11




New contributor




dzakob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





dzakob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






dzakob is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))
    – Wiktor Stribiżew
    2 days ago








  • 1




    Maybe don't use regex to parse HTML content.
    – Tim Biegeleisen
    2 days ago










  • BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.
    – Wiktor Stribiżew
    2 days ago


















  • Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))
    – Wiktor Stribiżew
    2 days ago








  • 1




    Maybe don't use regex to parse HTML content.
    – Tim Biegeleisen
    2 days ago










  • BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.
    – Wiktor Stribiżew
    2 days ago
















Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))
– Wiktor Stribiżew
2 days ago






Try min_prices = soup.find_all("div", class_="ps4-price") and then arr = , for el in min_prices:, arr.append(re.sub(r'D+', '', el.string)) => print(list(map(int, arr))). If you need to make sure there are both classes listed, try min_prices = soup.find_all("div", class_=re.compile(r"ps4-price at-min-price-d+"))
– Wiktor Stribiżew
2 days ago






1




1




Maybe don't use regex to parse HTML content.
– Tim Biegeleisen
2 days ago




Maybe don't use regex to parse HTML content.
– Tim Biegeleisen
2 days ago












BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.
– Wiktor Stribiżew
2 days ago




BTW, your formatted string is broken, {{}} is actually a couple of literal braces. You need to use single ones, {}, there.
– Wiktor Stribiżew
2 days ago












1 Answer
1






active

oldest

votes

















up vote
0
down vote













You may access the .string property of the elements you get with findAll, and only apply the regex to the plain text. E.g., since you expect only single integer numbers there, you may apply re.sub(r'D+', '', min_prices_text.string) on those strings.



See example code:



results = 
for product in range(21):
min_prices_text = soup.find("div", class_="ps4-price at-min-price-{}".format(product))
if min_prices_text:
results.append(re.sub(r'D+', '', min_prices_text.string))

print(results) # => ['29', '35']


Or use list(map(int, results)) if you want to cast the list of strings to integer.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    dzakob is a new contributor. Be nice, and check out our Code of Conduct.










     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372867%2fregular-expression-with-iterator%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    You may access the .string property of the elements you get with findAll, and only apply the regex to the plain text. E.g., since you expect only single integer numbers there, you may apply re.sub(r'D+', '', min_prices_text.string) on those strings.



    See example code:



    results = 
    for product in range(21):
    min_prices_text = soup.find("div", class_="ps4-price at-min-price-{}".format(product))
    if min_prices_text:
    results.append(re.sub(r'D+', '', min_prices_text.string))

    print(results) # => ['29', '35']


    Or use list(map(int, results)) if you want to cast the list of strings to integer.






    share|improve this answer

























      up vote
      0
      down vote













      You may access the .string property of the elements you get with findAll, and only apply the regex to the plain text. E.g., since you expect only single integer numbers there, you may apply re.sub(r'D+', '', min_prices_text.string) on those strings.



      See example code:



      results = 
      for product in range(21):
      min_prices_text = soup.find("div", class_="ps4-price at-min-price-{}".format(product))
      if min_prices_text:
      results.append(re.sub(r'D+', '', min_prices_text.string))

      print(results) # => ['29', '35']


      Or use list(map(int, results)) if you want to cast the list of strings to integer.






      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        You may access the .string property of the elements you get with findAll, and only apply the regex to the plain text. E.g., since you expect only single integer numbers there, you may apply re.sub(r'D+', '', min_prices_text.string) on those strings.



        See example code:



        results = 
        for product in range(21):
        min_prices_text = soup.find("div", class_="ps4-price at-min-price-{}".format(product))
        if min_prices_text:
        results.append(re.sub(r'D+', '', min_prices_text.string))

        print(results) # => ['29', '35']


        Or use list(map(int, results)) if you want to cast the list of strings to integer.






        share|improve this answer












        You may access the .string property of the elements you get with findAll, and only apply the regex to the plain text. E.g., since you expect only single integer numbers there, you may apply re.sub(r'D+', '', min_prices_text.string) on those strings.



        See example code:



        results = 
        for product in range(21):
        min_prices_text = soup.find("div", class_="ps4-price at-min-price-{}".format(product))
        if min_prices_text:
        results.append(re.sub(r'D+', '', min_prices_text.string))

        print(results) # => ['29', '35']


        Or use list(map(int, results)) if you want to cast the list of strings to integer.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 2 days ago









        Wiktor Stribiżew

        301k16122197




        301k16122197






















            dzakob is a new contributor. Be nice, and check out our Code of Conduct.










             

            draft saved


            draft discarded


















            dzakob is a new contributor. Be nice, and check out our Code of Conduct.













            dzakob is a new contributor. Be nice, and check out our Code of Conduct.












            dzakob is a new contributor. Be nice, and check out our Code of Conduct.















             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53372867%2fregular-expression-with-iterator%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Volksrepublik China

            How to test boost logger output in unit testing?

            Write to the output between two pipeline