What Kind of Data Do Data Analysts Use: It’s Not Just Spreadsheets and SQL

What Kind of Data Do Data Analysts Use: It’s Not Just Spreadsheets and SQL

You probably picture a data analyst sitting in a dark room, staring at green text on a black screen like something out of The Matrix. Or maybe you see someone drowning in Excel cells until their eyes cross. Honestly? Both are kinda wrong. If you’re asking what kind of data do data analysts use, the answer is basically everything that leaves a digital footprint.

Data is messy. It’s loud. It’s often broken.

When a company like Netflix decides what show to greenlight, or when a hospital tries to predict patient readmission rates, they aren't just looking at one neat pile of numbers. They’re digging through a chaotic mix of structured tables, erratic social media rants, and invisible logs generated by servers in the middle of the night.

🔗 Read more: Why the Nokia Mobile Phone 1998 Era Still Defines How We Use Tech Today

The Stuff That Fits in Boxes: Structured Data

Most people start here. Structured data is the "polite" data. It plays by the rules. It lives in Relational Database Management Systems (RDBMS) and sits comfortably in rows and columns.

Think about the last time you bought something on Amazon. Your customer ID is a number. The price is a decimal. The date is... well, a date. This is the bread and butter of the industry. Analysts use SQL (Structured Query Language) to talk to these databases. They’re pulling transactional data—sales records, inventory levels, and payroll. It’s predictable. Because it's so organized, it’s the easiest to analyze, but it only tells half the story. You know what happened, but rarely why.

The Wild West: Unstructured Data

Now we get to the headache-inducing stuff. Unstructured data is growing way faster than the structured kind. We’re talking about emails, PDFs, voice recordings, and those "open-ended" survey responses that everyone hates coding.

If you’re a data analyst at a place like Yelp, you aren’t just looking at the 1-to-5 star rating. That’s easy. The real gold is in the text of the review. "The chicken was dry, but the vibe was immaculate." How do you turn "immaculate vibe" into a data point? You use Natural Language Processing (NLP). Analysts have to take this mountain of text and find patterns. It’s incredibly resource-heavy.

Social media is the biggest culprit here. A single tweet contains the text, the timestamp, the geotag, and a list of who interacted with it. It’s a nightmare to clean, but it’s where the sentiment lives.

What Kind of Data Do Data Analysts Use When Things Get Weird?

Sometimes the data isn't even "content." It's just... behavior.

Clickstream data is a big one. It tracks every single move you make on a website. Did you hover over that "Buy Now" button for three seconds and then chicken out? An analyst knows. They use this to map out the "user journey." If 40% of people drop off at the shipping info page, there’s probably a bug or the shipping cost is too high.

Then there’s IoT data. Internet of Things. Your smart fridge, the sensors on a factory floor, or the GPS on a delivery truck. This data is "streaming." It never stops. Analysts working with IoT data often have to deal with time-series analysis, looking for anomalies in a constant flow of pings. If a turbine in a wind farm starts vibrating at a frequency that’s 0.5% off the norm, an analyst needs to catch that before the whole thing explodes.


Why the Source Matters More Than the Format

Analysts don't just categorize data by how it looks (structured vs. unstructured). They care about where it came from. This dictates how much they can trust it.

First-Party Data: The Good Stuff

This is data the company owns. If you work for Nike, your first-party data is what people buy on Nike.com. It’s clean-ish. You know exactly how it was collected. It’s your most valuable asset because you don't have to pay a middleman for it, and you aren't guessing about its accuracy.

✨ Don't miss: Why Searching for a Realistic Solar Flare Image is Harder Than You Think

Second and Third-Party Data: The Necessary Evils

Second-party data is basically someone else’s first-party data that you bought or traded for. Maybe a hotel chain and an airline swap loyalty program info.

Third-party data is the controversial one. This is the stuff collected by aggregators like Acxiom or Nielsen. It’s huge, it’s broad, and honestly, it’s getting harder to use. With the death of third-party cookies and the rise of privacy laws like GDPR and CCPA, analysts are having to pivot. You can’t just buy a list of "people who like cats and live in Ohio" as easily as you used to.

Quantitative vs. Qualitative

  • Quantitative: How many? How often? How much? It’s all about the numbers.
  • Qualitative: Why? How did it feel? What was the motivation?

A good analyst balances both. If the "Quantitative" data says sales are down, the "Qualitative" data (like customer interviews or focus group transcripts) tells you the product's new packaging looks like laundry detergent.


The Tools That Dictate the Data

The tool often limits the data. You aren't going to process 50 million rows of clickstream data in Excel. It’ll crash your computer before you can even hit "save."

  1. SQL: The language of databases. If you can't write a JOIN statement, you aren't getting to the structured data.
  2. Python/R: These are for the heavy lifting. Analysts use libraries like Pandas or Tidyverse to clean the messy unstructured stuff.
  3. Tableau/Power BI: These are for the "pretty" part. They take the data and turn it into something a CEO can understand in five seconds.
  4. Hadoop/Spark: For when the data is so big it needs to be spread across twenty different servers.

The Human Element: Metadata and Dark Data

Here is a secret: a lot of the data analysts use is actually "data about data." That's metadata. If you have a photo, the metadata is the shutter speed, the GPS location, and the date it was taken. For a data analyst, metadata is the map. Without it, you’re just looking at a pile of numbers with no context.

Then there’s Dark Data.

IBM once estimated that roughly 80% of data is "dark." It’s the stuff companies collect but never actually use. Old server logs, discarded zip files, notes from employees who quit three years ago. Modern data analysts are increasingly tasked with "mining" this darkness to find hidden efficiencies. It’s like digital archaeology.

Real-World Example: The Retail Turnaround

Let's look at a hypothetical (but very realistic) retail chain. They're losing money.

The analyst starts with structured data: sales receipts. They see that coats aren't selling.

Then they look at weather data (external data). It was a warm winter. Okay, that's a factor.

Then they dig into unstructured data: social media mentions. They find people are complaining that the coats look "outdated" compared to a competitor.

Finally, they look at sensor data from the stores. They realize people are picking up the coats, carrying them to the fitting room, and then leaving them there.

The Insight: The coats look good on the rack, but the fit is terrible once you put them on. The "what" was low sales. The "why" required four different types of data working together.


How to Actually Start Using This Information

If you’re looking to get into this field or just trying to understand what your team is talking about, stop obsessing over the "big" in Big Data. Focus on the variety.

  • Audit your inputs. Look at your own business or project. What are you ignoring? Are you only looking at your Shopify dashboard while ignoring the 500 unread customer support emails? That’s unstructured data waiting to be used.
  • Check your data hygiene. Data is only as good as its collection. If your sales team is "guestimating" lead scores, your analysis will be garbage. Garbage in, garbage out.
  • Learn a bit of SQL. Even if you aren't a "tech person," knowing how to query a table changes how you think about information. It forces you to see the world in relationships and logic.
  • Start small with Python. If you have a folder full of 200 PDFs, don't read them. Use a script to scrape the text. That’s data analysis in its purest form—turning chaos into a signal.

Data isn't just a commodity anymore; it’s the actual fabric of how decisions get made. Whether it's a timestamp on a server or a literal heart rate monitor's output, the "kind" of data matters less than the question you're trying to answer with it. Understand the format, sure, but master the context. That’s where the real power lies.

Next Steps for You:
Begin by identifying one source of "dark data" in your current workflow—something you collect but never look at. Use a basic tool like Excel's Power Query or a simple Python script to see if there's a trend hidden in those ignored logs. Once you've identified a pattern, compare it against your primary sales or performance metrics to see if there's a correlation you've been missing.