U.S. CitiesCity-Data Forum Index
Go Back   City-Data Forum > General Forums > Science and Technology > Computers
 [Register]
Please register to participate in our discussions with 2 million other members - it's free and quick! Some forums can only be seen by registered members. After you create your account, you'll be able to customize options and access all our 15,000 new posts/day with fewer ads.
View detailed profile (Advanced) or search
site with Google Custom Search

Search Forums  (Advanced)
 
Old 01-06-2015, 10:17 PM
 
Location: I live wherever I am.
1,935 posts, read 3,846,213 times
Reputation: 3258

Advertisements

The backstory: I have very strong "basic programming skills", insofar as I became fluent in programming on four languages which are quite antiquated by now, as far as I know - Logo, Pascal, TIBasic, and C. I used to be able to write programs that would blow people away, especially in those first three languages. However, I haven't written a program of any sort since 2001. So I have some questions about what can be programmed onto computers these days.


Let's start with Microsoft Excel. I'd love to hear from people who are fluent in programming Excel macros.

Can I write a macro to query an Excel database and only return the records that match a particular query?

Can macros also add data to databases?

Can I write a macro to query a database and change the background color of rows in the database whose data satisfy the query?

I already have a good sense of how to program a computer... in my mind I know how I would do it if I used the outdated languages I mentioned... I just need to know what the limits of Excel macros are. If I know the limits, I can learn the language and make it do whatever I want it to do within those limits.


Now some questions about Web programming.

I would like to write a program that crawls certain websites, searches for a particular text string, and then adds the data that follows the text string to a database. This data will not be "personal" or "identifying" to anyone in any way, nor will it be used for anything other than the optimization of our business. It is readily available and public. Simply put, I don't want to be bothered spending countless hours manually entering the data. I'm certain that this can be done, as I have heard of "bot" programs that harvest e-mail addresses from websites and add them to spam mailing lists. How can I write a similar program? What language would I need to use? What database management system would I have to use? Could it happen as simply as to transfer the data to an Excel database, or is that wishful thinking?

Simply put, if YOU were going to do what I just mentioned, how would you do it?
Reply With Quote Quick reply to this message

 
Old 01-07-2015, 07:35 AM
 
Location: Wandering.
3,545 posts, read 5,682,491 times
Reputation: 2664
On the Excel side:

Excel uses VBA (Visual Basic for Applications) for macros and scripting. VBA is a variant of VB6, and is very ancient itself (circa '98). Excel doesn't really have a DB (everything is in worksheets), but VBA is pretty powerful for what it is, and can do quite a lot. You can certainly iterate over the data in rows and cells, add and remove data, and manipulate the cell colors of rows and columns. I'm not sure that you can actually query the data in VBA, but I know that MSQuery will allow you to write queries for Excel (I have little experience with this, and it was a very long time ago).

If VBA fails you, there is a COM interface into all of Office, and you can manipulate most of it from outside programing languages that have more power.

If I was assigned this project today: If you need to interact with an external DB, then I'd write a standalone application to do the work (using C#). If it's all within Excel, and very simplistic, I might do it in VBA (but I really dislike using VBA due to its lack of power compared to full languages << This is just me though, and there's nothing wrong with VBA if it does what you need).


As for scraping data from web sites: most programing languages can do this, it's just a matter of finding the correct tools within that language.

If I was going to scrape a web site (I actually have a desktop application that does this to capture recipes from some cooking sites), I'd personally write a C# desktop app (since it's the language / framework that I'm most fluent in for desktop apps), add a library specifically designed for parsing HTML (HtmlAgilityPack in this case), and go to town.


As for a DB. On the Windows Desktop, SQL Express is a good choice IMO. It's free and full featured, and can be migrated to full SQL if you needs outgrow the limits of the free version.


If you want to learn a new language for doing this, I'd take a look at Either VB.NET (if you have a preference for Basic languages), or C# (if you have a preference for C languages). Both are languages on top of Microsoft's .NET framework, and have similar feature sets. Both can fully manipulate Office, both can build full blown applications (desktop / web / services / etc.), and both have a ton of tutorials, and support available on the web (C# has a larger community, more samples and support, and usually gets new features first, so if you are familiar with C languages I'd lean that direction).

The tutorials below will start you at the very beginning, including downloading Visual Studio (the IDE for the languages):

Visual Basic Fundamentals for Absolute Beginners | Channel 9

C# Fundamentals for Absolute Beginners | Channel 9
Reply With Quote Quick reply to this message
 
Old 01-07-2015, 08:48 AM
 
24,503 posts, read 35,979,772 times
Reputation: 12852
I was just thinking about whether there is ever a good reason to use jquery anymore. Scraping a website seems like the perfect use case.

Is create a chrome or Firefox plugin if scraping a small number of pages. Or use a headless browser, such as phantom or zombie for larger jobs. If you want to speed up the job, you can use nightwatch with saucelabs and just spin up several instances.
Reply With Quote Quick reply to this message
 
Old 02-14-2015, 02:13 PM
 
Location: (six-cent-dix-sept)
4,914 posts, read 2,508,232 times
Reputation: 3152
I'm not good with ms-excel but maybe what you are trying to achieve can be accomplished with tabbed delimited text files or possibly mysql if a true database makes sense. (I never heard of ms-excel being referred to as a database, I think the ms-office program for databases would be ms-access -- unless you really meant spreadsheet. sorry if I misunderstood)

I use wget/curl for web scrapping and parsing.
Reply With Quote Quick reply to this message
 
Old 02-14-2015, 02:29 PM
 
Location: East of Seattle since 1992, originally from SF Bay Area
30,884 posts, read 56,281,260 times
Reputation: 32899
Most of what the OP described can be done with Sharepoint, using data from an Excel file, or going directly to a SQL database. It seems like too many Excel macros are wasted time now, with even many "users" able to write a SQL script.
Reply With Quote Quick reply to this message
Please register to post and access all features of our very popular forum. It is free and quick. Over $68,000 in prizes has already been given out to active posters on our forum. Additional giveaways are planned.

Detailed information about all U.S. cities, counties, and zip codes on our site: City-data.com.


Reply
Please update this thread with any new information or opinions. This open thread is still read by thousands of people, so we encourage all additional points of view.

Quick Reply
Message:

Over $104,000 in prizes was already given out to active posters on our forum and additional giveaways are planned!

Go Back   City-Data Forum > General Forums > Science and Technology > Computers
Similar Threads
Follow City-Data.com founder on our Forum or

All times are GMT -6. The time now is 08:05 PM.

© 2005-2019, Advameg, Inc. · Please obey Forum Rules · Terms of Use and Privacy Policy · Bug Bounty

City-Data.com - Archive 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35 - Top