arXiv blog

How to Distinguish Fiction from Nonfiction

Telling fact from fiction isn't always easy on the on the Web. Now researchers have discovered a method that could help automate the process.

kfc 07/22/2010

  • 7 Comments

Pick up a piece of text and start reading and it usually becomes clear pretty quickly whether you're reading a nonfictional news story or a fictional novel.

Some clues come from the environment where the stories are found which provide hints, such as the presence of headlines, standfirsts and cross heads.

But even the text alone is revealing. News stories, for example, have very specific structures that give writers little room for creative manoeuvre.

But pinning down these differences in a measurable way that a computer might use to tell them apart is a little more tricky.

Now Joseph Stevanak and Lincoln Carr at the Colorado School of Mines in Golden have come up with a way to do it. They say that the key is to look at the networks that form when you examine how often words appear close together in each type of text.

The type of network they examined creates a graph in which each word in the text forms a vertex. A line connects two vertices if these words appear next to each other in the text. It is possible to explore longer range links by connecting vertices when they appear two or three or four words apart and so on.

Stevanak and Carr say that just two properties of this kind of network can help distinguish fiction from nonfiction stories. The first is the power law that describes the number of links to each vertex in the network. The second is the cluster coefficient which describes how well the vertices are connected to the rest of the network.

Measuring these two quantities alone can identify the type of story with remarkable accuracy. "Our analysis yielded a 73.8±5.15% accuracy for the correct classification of novels and 69.1 ± 1.22% for news stories," say Stevenak and Carr.

This kind of analysis has the potential to improve future generations of text-finding algorithms that they can better classify and hunt down the types of stories that individuals are looking for, and also to identify the communities producing it.

And although it doesn't look like a Google-beater just yet, it has huge potential. If there's one place where the ability to distinguish fact from fiction may turn out to be useful, it is surely on the web.

Ref: arxiv.org/abs/1007.3254: Distinguishing Fact from Fiction: Pattern Recognition in Texts Using Complex Networks

TRSF: Read the Best New Science Fiction inspired by today’s emerging technologies.

Print

Close Comments

To comment, please sign in or register

Forgot my password

UncleAl

409 Comments

  • 571 Days Ago
  • 07/22/2010

Disinformation's structure

Priority inputs for analysis are the officious utterances of Benjamin "BS" Bernanke, Hank "Hanky-Panky" Paulson, and Tim "Timmy!" Geithner.  Next in line would be "Das Capital," "Dianetics," and "The Book of Mormon."  Mop up with "Chariots of the Gods" and "Worlds in Collision."

Let's get a neutral call here, then move onto bigger things.

Reply

doanwon

76 Comments

  • 571 Days Ago
  • 07/22/2010

Re: Disinformation's structure

While at it put all the scientific publications containing outlandish theories about the universe through this algorithm.  At the same stroke weed out all the self-proclaimed omniscient authority on these theories.  Then we can start discussing science seriously.

While it's a good start, I would think that this sort of discriminator would have a hard time distinguishing biographies and autobiographies with historical fiction.  With all the knowledge available on the web more attention should be devoted to an intelligent agent embedded into the search engine with basic knowledge that can be used to learn and remember enough to categorize which site is newsworthy (like the major news companies) and which is not, and which is a fictional work vs. a nonfictional one. 

Reply

shomas

245 Comments

  • 570 Days Ago
  • 07/23/2010

Re: Disinformation's structure

It needs to be said, differentiating writing styles used in fiction versus nonfiction is not the same as determining truth from lies.

Reply

AKT

453 Comments

  • 571 Days Ago
  • 07/22/2010

Ideotitic method

This typical product of the land of instant experts would not detect that the SR is rubbish. Just bank. It is an inconsistent theory. We do not need these "experts" in American university system to rip off general public anymore. Enough is enough.

SR proves that v_g = c^2/w where v_g is group speed of wave and w is its phase speed. This means unless v_g = w = c, either v_g or W has to exceed c. This contradicts to the SR's result that nothing moves faster than the speed of light. Moreover sound wave has v_g = w but not c.

This means that SR is a "BAD" fiction. Use  brain and you will see who is hallucinating more easily and accurately. These authors from the cowboy land of Colorado are certainly hallucinating.

Best regards,

AKT

Reply

doanwon

76 Comments

  • 571 Days Ago
  • 07/22/2010

Re: Ideotitic method

It's true the math shows phase speed to be faster than c (if I remember correctly from school).  But it cannot be used to hold and convey any information.  My old RF professor worked on utilizing this condition for practical purposes.  RIP Dr. TK Ishii.

Edit: When a microwave enters a waveguide at an angle that is not perpendicular to the plane of the waveguide's aperture, the phase velocity will be greater than c and group velocity is less than c.

Reply

AKT

453 Comments

  • 571 Days Ago
  • 07/22/2010

Re: Ideotitic method

It is an excuse to discusswhat we can do with phase speed (or group speed). Logically, SR violates its own conclusion that nothing can move faster the  the speed of light.

This is bad enough. On the top of it, SR contradicts a well know fact that sound wave has the same value for group speed and phase speed. It is not c.

This is already more than enough to dump SR if physicists are equipped with sane mind. Only mental people will prescribe a theory like this.

The axiom of SR deduces this contradiction and it simply means that the theory of special relativity is bunk. No  excuse will remove this contradiction. All excuses made are violating the theory of SR.

As an inconsistent theory can prove anything, SR causes contradiction everywhere and many of them we have already seen. The only response we get from the defender of SR was that those who question SR are cranks? Quite a response is it not. Who is the real crank here?

Let me see, it is known that c^2*E^2-p is relativistic invariant. Also E^2 = (m0*c^2)^2 -+c^2*p^2. Putting them together we conclude that E is relativistic invariant. Even middle school maths student will see that SR is bunk.

After all, this corrupted theory starts with Einstein's thought experiments to frame it and it further develops into the light cone interpretation which undermines all of these thought experiments. This concludes that all such thought experiments are invalid as there is no causality between the emission of light and the reception of light.

I can present a mountain of contradictions coming out of SR. They think they are superior "revolutionaries". Normal people are inferior reactionaries accoding to them. 

As far as normal minds are concerned, theoretical physics is either mad house or cult. 

Best regards

AKT

Reply

aka steve

9 Comments

  • 570 Days Ago
  • 07/23/2010

Dream big (goole beater)

Maybe licence to Google... and others,
a fiction/non fiction filter would be good to seperate subject from story during search.
Finding similar styled authors is also a good way to package the program for use.
It will not be able to tell fact from fiction within a structured news article,
although it should be able to tell who is writing their own stories.
As for scientific theory well...it will point out poor strucure,
it might even indicate missing information or extra considerations.
In the end it can't stop the miss-directional or the mad-rant from being written.

Reply

Bio

The Physics arXiv Blog produces daily coverage of the best new ideas from an online forum called the Physics arXiv on which scientists post early versions of their latest ideas. Contact me at KentuckyFC @ arxivblog.com

Follow The Physics arXiv Blog on Twitter

Subscribe to the arXiv blog RSS Feed

Advertisement
Advertisement

Facebook

Advertisement