makingstuffs 4 days ago | next |

Looks cool from what I’ve seen (well done on the release) of it (I read the read me and poked through your code).

I’d be interested to see how this does against sites which have things like Cloudflare’s bot detection enabled and sites such as Google Trends.

From my experience the stock version of playwright doesn’t play all too well with them and for sites like google trends there are a lot pain points when trying to liberate data.

Still interesting nonetheless

krsma 3 days ago | root | parent | next |

Hi, creator here.

So our open-source version does not provide cloudflare bypass or captcha support. It is impossible to have a robust system for the same completely FOSS. But we do have it available in our cloud version (which we launch soon, currently in testing). Our open-source version allows you to BYOP (Bring Your Own Proxy) to handle all the bypassing. Our OSS version is being used by users with small to medium scraping needs :)

makingstuffs 3 days ago | root | parent |

That makes perfect sense, I know from experience that things like Google Trends scraping is an absolute pain in the backside and requires a lot of jiggery-buggery in order to get around their bot detection — a proxy alone will do nothing to bypass them due to the mechanisms they’ve implemented.

techn00 4 days ago | root | parent | prev |

It doesn't, you cant find any reference to the captcha solving or anything like that in the repo. There's one line for the stealth plugin that is commented out:

// chromium.use(stealthPlugin());

imo this is the hardest part about scraping, evading bot detection and captchas.

edit: and keeping the scraping logic & rules up to date

krsma 3 days ago | root | parent |

Hi, creator here.

So our open-source version does not provide cloudflare bypass or captcha support. It is impossible to have a robust system for the same completely FOSS. But we do have it available in our cloud version (which we launch soon, currently in testing). Our open-source version allows you to BYOP (Bring Your Own Proxy) to handle all the bypassing. Our OSS version is being used by users with small to medium scraping needs :)

All of this is mentioned in our README.md.

vivzkestrel 4 days ago | prev | next |

What happens if a captcha pops up on one of the websites you are trying to scrape? Your documentation mentions captcha but are we talking specifically Google Recaptcha or hcaptcha or any captcha provider?

krsma 3 days ago | root | parent |

Hi, creator here.

So our open-source version does not provide cloudflare bypass or captcha support. It is impossible to have a robust system for the same completely FOSS. But we do have it available in our cloud version (which we launch soon, currently in testing). Our open-source version allows you to BYOP (Bring Your Own Proxy) to handle all the bypassing. Our OSS version is being used by users with small to medium scraping needs :)

rickm20203 3 days ago | prev |

Great stuff. Have set it up on my local. I believe it has huge potential.

Also, why are people commenting about captcha and cloudflare when the docs/Readme has it mentioned they offer a cloud for bypassing?

Bypassing bot detection requires money. I am sure the creator is not Elon Musk to provide it for free to everyone