Scrape and Swipe : Content Thieves

Share

Site scraping beans

I’m wired a little differently than you my friends. Maybe I’m a little odd, and you see my madness, but you also see my gift. I have an incredible gift, and without sounding too stuck up (I’m really not) it’s the gift of words. I can tell you stories. I stand out, apart from the crowd, the observer, but certainly not as antisocial as I might seem. I’m researching everything around me, because I am a writer. I am often distracted by life, waiting on pins and needles with twitchy fingers. I am poised, waiting for the opportunity to write. My tools are pen and paper,, keyboards and ideas. I live to weave words together to send the stories to you dear readers. No pod casts, videos, etc. for me, at least not yet. Writing is my way of communicating to the world, of saying the things that would otherwise remain buried deep within.

I belong to a special tribe – you may be amongst my comrades: writers, artist, composers. We are the ones who function outside the box, and we are a part of something bigger. We talk to ourselves, scribbling on receipts, envelopes and even our forearms and legs (assuming you live somewhere warm).  As artists, we are addicted to journals, notepads, favored writing and sketching implements. For me, it’s all about the ink color. You are nodding in agreement, smiling because you are the same. We writers are constantly listing, notating, typing cryptic notes in our phones, writing everywhere and anywhere, we are of the same tribe, and inspired by everything.

And now, some very bad news. We write, we post, we share. We create online books, links and blogs, impatient our homegrown, creative wisdom to our followers, hoping for more; but somebody wants our words. The blood of our souls, carefully crafted into sentences and ideas; our original thoughts plundered by those unsavory characters trolling the World Wide Web. Well my fellow bloggers, writers and artists, we may as well post a classified ad .

Calling all lazy, sneaky content thieves! Web and site scrapers look here! steal my content, get rich quick!

Once the ad has been shared, we might think about posting a small infomercial. Something like this:

Dishonest, lying thieves: Did you know that you can make money stealing other people’s hard work? No long hours, soul leeching or actual research or writing required. You can swipe the content off somebody else’s existing website. Yes, you can!  Just stop by for a quick visit and duplicate the unsuspecting writer’s creative, personalized content. Get some ads! Steal their visitors. Raise your stats with a Spammy stolen site.  And all free today. Act now and receive a special bonus.

OK. So you’re thinking this chick’s lost her mind. Nobody does this, right? WRONG! This practice is pretty rampant on the internet right now and is known as Site or Web Scraping. Ever searched Google for some info and ended up on a site where the main article is buried in a sea of ads and spam? It’s unclear to you who or what the site is, and there is no author listed or credited. You have now had the pleasure of visiting a scraped version of someone’s site.

The definition of web or site scraping, sometimes referred to as web harvesting, is a software technique that uses “bots” to extract information from websites. Google can be considered a content scraper, but it’s the evil hacking ones you need to worry about.

Malicious web scraping has become way too common. It affects many types of sites, bust most commonly those related to digital publishing and social media. My blogging buddies, we are under attack. In the past two weeks this very site has been scraped. Yes you Ukrainian content thieving hacks – I see you! So now what?

There are ways to protect yourself, a little tightening of the security belt, but no real permanent solutions (yet) for prohibiting scrapers. So how can you tell if you’re site is being attacked? Though I’m flattered by the Ukrainian  devotion and attention, it is MY writing. I thought about it, researched it, felt it, organized and composed it, and put it up to share with my adoring fans. Yes, you dear readers. (Don’t fail me now!)

Alright, so how did I find out? Being so new to the blogging world, and maybe a bit overexcited about my beans being out there, I check my stats. A lot. On the past two days when my new posts were shared, my site traffic jumped. I was jumping with happy joyful squeals to have had over 2000 page reads. Then I noticed something. Sixty eight visitors did not read 1487 pages. I don’t even have that many pages, so something is rotten in the virtual state of Denmark. I looked, I investigated, and I found you out. Evil scraping site snatchers – I wish you static in your internet connections and viruses galore!

What’s a girl to do but call her hosting company. This is how I find out that this is a prevalent issue on the net. Fellow writers, bloggers, Etsy shop creative sorts and fellow Creative Superheroes – Beware! The best advice I can offer is to monitor your stats and check those IP addresses coming in, if it looks too superb and wonderful, it just may be.  Now, ever vigilant and protective, (I’m calling on my mama bear instincts) when I see those thieving IP addresses I can block them from my site. Of course they’ll probably be back with a new one, so I will attempt to figure out their patterns so I can preempt their next attack.

The unfortunate truth, is that if you have awesome content, (and we all do) and your site presents information that can be accessed by a regular browser like Yahoo or Google, you’re at risk. If it’s easy for the average Jane to read, that same content can be nicked, swiped and stolen and reposted elsewhere. No credit of authorship, ownership or links to your original piece. It sucks, right? I did some homework, and though it’s just the tip of the info, perhaps we can all try (note that I say try) to stop the scrapers.

  • Rate limit individual IP addresses. Basically block the ones making unreasonable and too fast requests to your site.
  • if you monitor your stats carefully, you can block the nasty buggers through their IP addresses. Watch for patterns and attempt to preempt.
  • Use something called a CAPTCHA. I’m still unsure what this is, but apparently it separates the bots from the humans.
  • Be tricky. Embed information inside PDFs, media and make it more difficult for thieves to cull your content. Much easier when it’s a simple string of text. (And yes, this will be a job for my webmaster extraordinaire )
  • if you are exceptionally talented and know how to program or write code, change the HTML on a regular basis.mif your site is ever changing and inconsistent, it will be confusing to those content burglars. I am not so proficient, and keep in mind that it’s not necessary to completely redesign. Just change it up a bit.Honeypot of beans
  • This one I’m still researching. There’s something called a honeypot page.  Wake your inner Pooh bear and create or have your we designer create) some of these pages. They are designed specifically for robots and web crawlers. (Sorry Spidey, best stick to NY).

Words flow through my arteries, and I am a unique and wonderful writer. ( Yes, yes, I’m gloating and patting myself on the back. Someone has to!)  My original content is created constantly, day and night. Like you fellow bloggers, I would hate seeing my hard work on a SPAM site maybe on,y hours after I’ve posted. I have seen in my research where some poor victims complain that the SPAM site using their stolen content ranked higher than and/or had more site traffic and unique visitors than their own. I don’t have to describe the devastating feelings of frustration that follow.

Content scraping is a massive issue today, with someone using your work wothout permission, outranking your site and perhaps even monetizing your content and stealing your visitors. You can report them, if you can find them. Block them. Visit their site and look for a contact page. Legally you can demand that they remove your content. WordPress has some tips, but keep in mind the article is old. It may not be compatible with the latest upgraded versions. There is a website www.copyscape.com that checks your URL of your content and finds duplicates if they’re out there.Its basically a plagiarism checker. I believe the first few checks are free.

Anybody out there aware of these thieving black hearted content bandits? Any creative solutions or tried and true methods to prevent scraping? Please leave comments and suggestions. Let’s share some creative genius morning beans.

Stay caffeinated and safe my friends.☕️

Share

6 Comments

  1. Incredible what they can do nowadays. Thanks for sharing your experience and giving such actionable tips to help.

  2. According to IBM 100,000 new strains of malware are distributed by more than 10,000 malicious new domains each day, most attacks are in an attempt to compromise an organization’s cyber security.

2 Trackbacks / Pingbacks

  1. Brute Force Attacks – Bloggers Beware! | Morning Beans Blog
  2. WordPress and the dangers of websites – Intruder Alert

Comments are closed.