Is Microsoft running out of steam? – The H Open Source: News and Features
Blog
-
tables in PDF files – are there any, that you would like to get hold of?
If so, pls leave a comment here or drop me a line at my e-mail address!
I can help you. -
Letter case – Wikipedia, the free encyclopedia
Letter case – Wikipedia, the free encyclopedia
“snake case” – I have not yet come across this term before today, but this is my preferred letter case.
-
questions and answers on Apache Ant by Roseanne Zhang
questions and answers on Apache Ant (the software tool for automating software build processes) by Roseanne Zhang
-
The Social Network (2010) – IMDb
The Social Network (2010) – IMDb
That’s the movie on Mark Zuckerberg, the “inventor” and creator of Facebook.
From Trivia:
- The opening breakup scene with Jesse Eisenberg and Rooney Mara ran eight script pages and took 99 takes. (link)
- “Who was the movie star?” – “Does it matter?” – the movie star was, in fact, Natalie Portman (Born: Natalie Hershlag), who was enrolled at Harvard from 1999 to 2003 and helped screenwriter Aaron Sorkin by providing him insider information about goings-on at Harvard at the time Facebook first appeared there.
- The Winklevoss twins were both played by actor Armie Hammer. However, Ralph Lauren model Josh Pence played one of them strictly from the neck down. His face was digitally replaced with Hammer’s to make them appear identical, as the two men are unrelated and look nothing alike. The two spent 10 months in twin boot camp to match one another’s subtle movements and rapport.
From Quotes:
- “As if every thought that tumbles through your head was so clever it would be a crime for it not to be shared.”
- “You’re not an asshole, Mark. You’re just trying so hard to be one.”
Update after watching the movie:
I think the Winklevoss twins should not have gotten any money.
My impression is, that they just discussed a business idea with Zuckerberg, and discussing such an idea a without non-disclosure agreement isn’t really worth anything.
They should not have gotten money or shares from Zuckerberg or Facebook, that was wrong.I think, it’s very sad, how the friendship between Mark Zuckerberg and Eduardo Saverin evolved, but then that’s how it goes with “business partnerships”. You have to make sure to stay in very, very close contact with your partners, otherwise you run the risk to get catapulted out of the game. Of course it’s esp. very sad to see, that “Zuckerberg dropped Saverin’s 30% ownership share of Facebook down to 0.03%” (from en.wikipedia.org/wiki/Eduardo_Saverin#Personal_life_and_Facebook).
Justin Timberlake … plays Napster creator Shawn Fanning as a slightly delusional, paranoid entrepreneur (from technofunkie‘s review (on IMDb) on this movie).
I am very, very grateful to my supporter friend, who allowed me to watch this movie.
-
TableSeer.SourceForge.net
TableSeer | Download TableSeer software for free at SourceForge.net
TableSeer is a tool that automatically identifies tables in digital documents and extracts the contents in the cells of the tables as well as table metadata.
That software seems to apply more heuristics than pdf2table.
-
my wifi access point TP-LINK TP-WA901ND
I am employing another wifi access point (that’s the device, whose name I am using here in the tittle) and a different WPA encrypted wifi network (SSID) for my neighbours, and today I thought, I should have a look at it again.
I got intrigued to do a firmware upgrade on the device, not really the latest one, but one of this year, and after rebooting
- I noticed, the “System” LED kept blinking slowly,
- and I also couldn’t access it any more.
I feared, these 2 symptoms would be related, and that made me a little nervous.
I enforced a reset by pressing the reset button for 5 seconds, and applied the settings again, that I had applied severals months ago. Everything is fine now.
The “System” LED is still blinking slowly, but the manual doesn’t handle this case and I also can’t notice any real problem, that this might indicate, so I simply ignore that.
Just to make sure, that my lovely neighbours do not successfully invite anybody to my wifi network by themselves, today I enabled MAC address whitelisting on that wifi network, and I added the MAC addresses of all of their computers, that I am aware of. MAC address whitelisting is not the last security feature I apply to protect myself and my own computers. I also route their IP packages through a different VLAN on a managed switch, a Netgear FS526T, but that actually belongs into a different blog article.
Actually this device is not only a wifi access point. You can also operate it in several other modes, but right now I am not in the mood to describe that.
-
on PDF
Nowadays on the Web or through e-mail you are getting more and more PDF files as electronic documents instead of documents on paper.
Roughly spoken PDF documents are expected to display the same way on every computer
platform (as opposed to documents created by usual word processing
software). This is regarded a major advantage of PDF.PDF vs. fonts vs. platform (in)dependence vs. resizability/scalability
Whenever a PDF document makes use of outline fonts and stroke fonts as opposed to bitmap fonts (see the Wikipedia article on computer fonts!), you are able to resize resp. rescale your document to different sizes without suffering from the loss of quality of the fonts used. This is in general considered another major advantage.
But computer fonts are not in the public domain, so on every computer platform, different available fonts are used for PDF documents.So what can we do against platform dependency stemming from fonts?
- Include the fonts: that’s the approach used by PDF/A.
PDF/A is especially employed, where documents need to be available even after many years in the context of document archives.
The major downside of this approach: PDF/A documents are much, much bigger than usual PDF documents, storing the fonts within them takes a lot space. - Another approach is to render text and fonts into ready-made bitmaps.
Of course documents of this kind display best with a 1:1 relationship of the pixels in your documents to the pixels on your screen resp. on your printer output.
Any resizing / rescaling results in pour quality.
And I think you understand this very well: there is not text (as text) at all left in your PDF document, and you will not be able to extract any text from such a document.
Now you know: different kinds of PDF documents come with different advantages and also disadvantages.
I am interested here in PDF documents, that are not rendered into “one bitmap per page”, but which rather contain the source document’s text. Extracting that text simply as text is more or less an easy piece of cake, and there already exists software for
this purpose.PDF basics
Before I dive with you into what information we want to extract from PDF files, I want to explain PDF a little.
I am honestly not too deep into PDF, but I
understand it as an advanced and optimized version of PostScript. My little
knowledge of PostScript is (please find a slightly lengthier version here in
the Wikipedia article!):- It’s a stack-based programming language like Forth using reverse
Polish notation. - It has data structures like arrays and dictionaries, but nothing
more abstract than that. - Subprograms are called resp. regarded as operators of the stack
machine. - Some relevant information details may be coded into operator names.
- Some other relevant information details (like page numbers) are coded into
comment lines, see the article on PostScript Document
Structuring Conventions. I have no clue, what corresponds to that
in PDF. Maybe there are language elements for that.
Now you have an idea of how PDF looks like, and you may have a vague idea, of what is possible with PDF and what isn’t.
- Include the fonts: that’s the approach used by PDF/A.